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SECTION  I 
INTRODUCTION 


VISUAL  SIMULATION 

The  function  of  a  visual  simulation  system  in  a  training  device  is  to 
present  a  view  of  a  simulated  real  world  environment  to  a  weapon  system 
operator  trainee.  Visual  simulation  technology  can  be  categorized  into  two 
broad  technology  areas;  image  generators  and  displays.  An  image  generator 
accepts  information  regarding  the  viewpoint  and  viewing  direction  of  the 
observer  and  creates  the  simulated  real  world  imagery  in  the  format  suitable 
for  the  display  system.  The  display  system  then  presents  the  view  to  the 
observer.  Image  generators  contain  a  physical  or  mathematical  model  of  a 
real  world  environment  from  which  the  required  view  or  scene  information  is 
obtained.  An  example  of  an  image  generator  which  uses  a  physical  model  is  a 
system  which  employs  a  television  camera  and  a  three-dimensional  scaled 
terrain  model  board  for  simulating  a  pilot's  view  as  he  flies  over  the  terrain. 
A  computer  image  generator  (CIG),  on  the  other  hand,  processes  a  mathematical 
model  of  the  visual  environment  to  produce  the  required  scene  information. 

Prior  to  the  advent  of  CIG  technology,  the  camera -model board  type  of  system 
dominated  image  generator  technology.  The  reasons  for  the  current  trend 
toward  CIG  technology  have  been  summarized  by  Wekwerth*.  The  areas  of  com¬ 
parison  included;  depth  of  focus  (limited  with  camera  at  low  altitudes), 
stability  (degradation  of  electro/mechanical  components  with  camera  system), 
gaming  area  (modelboard  size  and  scale  restricted),  flexibility  (changing 
environments  in  a  CIG  system  is  easier  than  changing  a  modelboard), 
and  power  consumption  (150KW  for  modelboard  vs.  15KW  for  CIG)  among  a  dozen 
other  reasons.  Other  investigators  (O'Connor2,  Monroe3,  Thorpe4)  have 
pointed  out  the  advantages  of  CIG  systems  in  terms  of  training  effectiveness. 


*Wekwerth,  M. ,  "The  Lufthansa  Day/Night  Computer  Generated  Visual  System", 

In  AGARD  Conference  Proc.  No.  249  (ADA063850),  pp.  12-1,  12-6 -  April  1978. 

2 

O'Connor,  F;  Shinn,  J.;  and  Bunker,  W. ,  "Prospects,  Problems,  and 
Performance:  A  Case  Study  of  the  First  Pilot  Trainer  Using  CGI  Visuals", 
in  Proc.  of  Sixth  NTEC/Industry  Conference,  pp.  55-83,  November  1973. 

3 

Monroe,  E.,  "Air  to  Surface  Full  Mission  Simulation  by  tie  ASUPT  System", 
in  Proc.  of  9th  NTEC/Industry  Conference,  pp.  41-48,  November  1976. 

4 

Thorpe,  J.;  Varney,  N.;  McFadden,  R.;  LeMaster,  W.;  and  :ihort,  L.,  "Training 
Effectiveness  of  Three  Types  of  Visual  Systems  For  KC-13!>  Flight  Simulators", 
Air  Force  Human  Resources  Laboratory,  Flying  Training  Division  Report  AFHRL- 
TR-78  16,  June  1978. 
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COMP  ITER  IMAGE  GENERATION  SYSTEMS 

The  real-time  CIG  systems  currently  employed  in  visual  simulation  systems 
resu  ted  as  an  outgrowth  of  the  field  of  computer  graphics.  Newman^  provides 
an  exellent  source  for  review  of  the  mathematics  and  algorithms  utilized  in 
computer  graphics.  Non -real -time  computer  graphics  research  is  primarily 
dire  .ted  toward  creating  more  realistic  computer  generated  scenes  (Crow", 

Csur  ')  with  little  regard  to  the  amount  of  computation  time  and  hardware 
requ  red.  Real-time  CIG  research  is  primarily  directed  toward  the  same  end 
with  n  the  hardware  and  time  constraints  of  a  real-time  system.  (A  real-time 
CIG  creates  a  complete  new  scene  every  1/30  second  with  a  pipeline  computa¬ 
tion  time  of  less  than  1/10  second).  Morland^  describes  the  design  and 
capabilities  of  the  CIG  system  developed  for  the  NAVTRAEQUIPCEN  Visual 
Technology  Research  System,  developed  by  General  Electric.  Woomer* 
describes  an  Implementation  of  a  calligraphic  CIG  system  developed  by 
McDonnell  Douglas.  Schumacker10  compares  calligraphic  C^G  to  Raster  CIG. 
Potential  improvements  to  the  State  of  the  art  of  real  time  CIG  systems  are 
described  by  Bunker11,  Marconi12,  and  Swallow1^. 


c 

Newman,  W.  and  Sproull,  R.,  "Principles  of  Interactive  Computer  Graphics", 
2nd  Edition,  McGraw-Hill  Book  Company,  1979. 

^Crow,  F.,  "Shaded  Computer  Graphics  in  the  Entertainment  Industry",  in 
Tutorial  on  Computer  Graphics,  IEEE  Catalog  No.  EHO-147-9,  1979. 

7Csuri,  C.,  "Computer  Graphics  and  Art”,  in  Tutorial  on  Computer  Graphics, 
IEIE  Catalog  No.  EHO-147-9,  pp.  421-433,  1979. 

Q 

Mo-'land,  D.,  "System  Description  -  Aviation  Wide-Angle  Visual  System 
(AWAVS)  Computer  Image  Generator  (CIG)  Visual  System",  Technical  Report 
NAVTRAEQUIPCEN  76-C-0048-1,  Naval  Training  Equipment  Center,  Orlando, 
Florida,  February  1979. 

g 

Woomer,  C.  and  Williams,  R.,  "Environmental  Requirements  for  Simulated 
He  icopter/VTOL  Operations  From  Small  Ships  and  Carriers",  in  AGARD  Conf. 
Proc.  No.  249,  Piloted  Aircraft  Environment  Simulation  Technologies, 

AD, 1063850,  October  1978. 

10Schumacker,  R.  and  Rougelot,  R. ,  "Image  Quality:  A  Comparison  of  Night/ 

Dusk  and  Day/Night  CGI  Systems",  In  Proceedings  of  the  1977  Image  Conference 
he  d  at  Williams  AFB,  Arizona,  17-18  May  1977,  pp.  243-255. 

^Bunker,  W.,  "Computer  Image  Generation  Imagery  Improvement:  Circles, 
Contours,  and  Texture",  Technical  Report  AFHRl-TR-77-66,  Advanced  Systems 
Division,  Air  Force  Human  Resources  Laboratory,  Wrlght-Patterson  Air  Force 
Base,  Ohio,  September  1977. 

12 

Marconi  Radar  Systems  Limited,  Product  Brochure,  "A  Picture  Generator 
for  Flight  Simulators". 

13 

Swallow,  R. ,  "Computrol  Computer  Generated  Day/Dusk/Night  Image  Display", 

In  Proceedings  of  11th  NTEC/Industry  Conference,  pp.  321-331,  November  1978. 
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CIG  technology  is  rapidly  growing  and  the  capability  to  process  a  complex 
environment  in  real-time  is  a  reality.  However,  the  process  to  create  the 
complex  environment  model  is  currently  labor  intensive  and  expensive.  The 
purpose  of  this  report  is  to  propose  techniques  to  make  the  environmental  data 
base  creation  process  more  efficient. 

DATA  BASE  CONTENT 

The  basic  information  stored  in  the  environmental  model  or  data  base  is 
geometry  and  appearance.  The  specific  requirements  as  to  the  size  of  the 
gaming  area,  the  minimum  size  of  details  in  the  gaming  area,  the  number  of 
details  in  a  given  scene,  and  the  required  fidelity  to  the  real  world  is 
strongly  influenced  by  the  tasks  required  for  the  specific  mission  being 
trained.  In  many  cases  the  requirements  are  unknown.  Often  there  is  a  need 
for  the  data  base  to  represent  actual  real  world  areas  rather  than  generic 
areas.  For  example,  if  the  task  is  to  navigate  a  ship  in  Norfolk  Harbor  the 
data  base  should  represent  Norfolk  Harbor.  Many  training  tasks  require  that 
the  weapon  system  operator  use  a  variety  of  sensor  systems.  In  these  ca^es 
the  data  bases  must  correlate.  For  example,  the  radar  data  base  should  have 
features  located  in  the  same  geographic  position  as  the  visual  data  base. 
Hoog*4,15  and  Basinger*6  discuss  the  general  requirements  for  a  data  base  and 
make  a  good  case  for  the  use  of  information  which  represents  the  real  world 
(DMA17)  as  a  framework  from  which  CIG  data  bases  can  be  built. 

DATA  BASE  CONSTRUCTION 

The  structure  or  form  in  which  the  data  is  organized  is  a  function  of 
the  CIG  processing  technique.  Sutherland*8  classifies  the  various  processing 


14 

Hoog,  T.;  Dahlberg,  R. ;  and  Robinson,  R. ,  "Project  1183:  An  Evaluation 
of  Digital  Radar  Landmass  Simulation",  in  Proceedings  of  NTEC/Industry 
Conference  NAVTRAEQUIPCEN,  IH-240,  pp.  54-79,  November  1974. 

15 

Hoog,  T.  and  Stengel,  J.,  "Computer  Image  Generation  Using  the  Defense 
Mapping  Agency  Digital  Data  Base",  in  Proc.  of  the  1977  Image  Conference 
at  Williams  Air  Force  Base,  pp.  203-218,  May  1977. 

16Basinger,  J.  and  Ingle,  S.,  "Data  Base  Requirements  for  Full  Mission 
Simulation"  in  Proceedings  of  the  1977  Image  Conference,  Air  Force,  Human 
Resources  Laboratory,  Flying  Training  Division,  Williams  AFB,  Arizona, 
pp.  25-33,  May  1977. 

17Defense  Mapping  Agency,  "Product  Specifications  for  Digital  Landmass 
System  (DLMS)  Data  Base,  PS/ICD-E-F-G/100,  July  1977. 

18 

Sutherland,  I.;  Sproull,  R. ;  and  Schumacker,  R. ,  "Characterization  of 
Ten  Hidden-Surface  Algorithms",  Computing  Surveys,  Vol .  6,  No.  1,  pp. 
1-55,  March  1974. 
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algorithms  which  utilize  data  bases  in  which  the  inf>rmation  is  stored  as  planar 
polygons.  In  current,  real-time  CIG  systems  polygon  models  are  the  basic  data 
structure.  Monroe19  describes  the  techniques  utiliz’d  in  the  construction  of 
a  polygon  data  base.  Kotas™  describes  the  polygon  lata  base  construction 
facility  at  NAVTRAEQUIPCEN. 

REPORT  SUMMARY 

This  report  is  primarily  the  result  of  a  literature  search  and  is  net 
meant  to  be  an  in-depth  discussion  of  the  subjects  covered.  The  prime  purpose 
was  to  provide  an  overview  of  the  problems  involved  in  CIG  data  base  construc¬ 
tion  and  discuss  the  technologies  which  are  pertinent  to  the  automation  of 
data  base  development.  In  Section  II  of  this  report  a  description  of  modeling 
criteria  in  terms  of  scene  detail  is  proposed.  Section  III  describes  the 
various  data  base  structures  used  in  computer  graphics  with  the  understanding 
that  real  time  CIG  systems  currently  use  polygon  representations  but  future 
CIG  systems  might  require  different  data  base  structures  if  only  to  nake  the 
modeling  task  more  efficient.  Section  IV  discusses  stereophotogranmetrl c  and 
digital  image  processing  techniques  for  extracting  CIG  data  base  information 
from  photographs.  Section  V  describes  the  components  of  a  modeling  system  in 
terms  of  the  hardware  necessary  to  implement  stereophotogrammetric  and  digital 
image  processing  of  imagery  for  CIG  data  base  development. 


l9Monroe,  E.,  "Environmental  Data  Base  Development  Process  for  the  ASUPT 
CIG  System",  Air  Force  Human  Resources  Laboratory,  Technical  Report 
AFHRL-TR-75-24,  August  1975. 

20|<otas,  J.  and  Booker  J.,  "The  AWAVS  Data  Base  Facility  -  A  Comprehensive 
Prepare tion  Package",  In  Proc.  of  11th  NTEC/ Industry  Conference,  pp. 
49-62,  November  1978. 
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SECTION  II 

SCENE  DETAIL  REQUIREMENTS 


INTROD  ICTION 

I  i  this  Section  an  attempt  is  made  to  identify  and  quantify  the  information 
which  s  operated  on  by  the  CIG  system  to  produce  a  simulated  visual  environment. 
This  information  includes  geometric  information  such  as  size,  shape,  ind  loca¬ 
tion  a-  well  as  the  less  easily  defined  modes  of  appearance  such  as  b-ightness, 
hue,  s  turation  (these  three  can  also  be  called  spectral  luminance),  trans¬ 
parency,  and  glossiness.  The  scene  illumination  also  affects  the  appearance. 
Illumination  has  spectral  properties  and  objects  in  the  scene  have  reflectance 
properties  which  are  a  function  of  color  and  direction  (OSA^l). 

This  Section  discusses',  the  difference  between  seeing  and  percei/ing, 
the  capabilities  of  the  eye,  scene  parameters  affecting  the  performance  of 
certain  visual  tasks,  and  recommendations  for  scene  detail  requirements. 

SEEING  VERSUS  PERCEIVING 

The  purpose  of  a  weapon  system  trainer  is  to  provide  an  environment 
wh  ch  will  teach  and  exercise  an  operator  in  those  skills  required  in  the 
performance  of  his  mission  tasks  in  the  actual  weapon  system.  Since  an 
operator's  performance  is  based  on  his  perception  of  his  environment,  the 
sinulated  environment  should  be  perceptually  similar  to  the  real  world  en¬ 
vironment.  The  visual  simulation  system  in  a  weapon  system  trainer  provides 
a  visual  environment  to  the  operator  which  should  be  perceptually  similar  to 
th?  real  world  visual  environment.  In  a  CIG  visual  simulation  system  the 
operator's  perception  of  his  environment  can  be  considered;  to  originate  in 
the  data  base,  to  proceed  through  the  image  generation  and  the  display 
system,  to  be  seen  by  the  observer's  eyes,  and  finally  to  be  operated  on  by 
the  observer's  perception  process  (involving  his  memory,  emotional  state, 
and  corcentration)  to  yield  his  perception.  The  information  rate  of  the  eye- 
brain  perception  process  has  been  estimated  at  5  X  10-*  bits/second  (Sagan22). 

If  this  process  could  be  accurately  determined  any  visual  environment  could 
be  perceptually  replicated  at  this  relatively  low  information  data  rate. 
Unfortunately,  the  perception  process  is  difficult  to  analyze  and  quantify. 
Consequently,  a  visual  simulation  system  attempts  to  replicate  what  the  eye 
can  see  in  the  real  world  with  sufficient  similarity  such  that  the  perception 
is  functionally  identical  to  the  observer's  perception  in  the  real  world  as 
measured  by  training  transfer.  There  is  no  conclusive  research  as  to  the 
required  degree  of  realism  or  fidelity  necessary  to  train.  In  order  to  be 


21 

Optical  Society  of  America,  "The  Science  of  Color",  Optical  Society  of 
America,  Washington,  D.C.,  1963. 

22 

Sagan,  C.,  "The  Dragons  of  Eden",  Ballantine  Books,  New  York,  1977. 
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confident  that  positive  transfer  of  training  is  occurring  two  general  rules 
are  usually  followed  (Hamilton23) ;  maximize  the  similarity  between  the 
simulated  and  operational  environments,  and  provide  adequate  experience  with 
the  task. 

Another  driving  force  behind  emphasizing  perceptual  fidelity  as  opposed 
to  realism  is  the  high  cost  of  realism.  Replication  of  all  sensible  attri¬ 
butes  of  the  real  world  is  potentially  possible  but  also  very  expensive. 

Althiugh  the  concept  of  perceptual  fidelity  has  been  voiced  before 
(Wood24),  the  design  and  specification  of  visual  simulation  systems  will  con¬ 
tinue  to  oe  based  on  physical  fidelity  to  the  real  world  until  those  trade-offs 
on  realism  required  for  specific  training  transfer  have  been  quantitatively 
identified.  For  example,  Welch25  states  that  good  texture  and  parallax  cues 
are  sufficient  for  piloting  training  in  the  nap-of-the-earth  (NOE)  mission  but 
the  navigation  training  requires  a  much  more  complex  set  of  topographic, 
hydrographic,  and  botanical  cues.  The  visual  cues  required  for  the  simulation 
of  the  full  NOE  mission  almost  defy  analysis.  Gibson25  points  out  that  the 
visual  stimulus  need  only  be  a  correlate  of  the  real  world  property,  not  a 
copy  of  it,  in  order  for  the  perception  to  be  the  same.  Bunker27  describes 
an  example  in  which  parallel  converging  lines  serve  the  same  function  as  a 
texture  gradient  to  produce  a  perception  of  depth  in  a  visual  simulation. 

In  this  Section  the  performance  parameters  of  the  eye  will  be  reviewed 
as  well  as  some  perceptual  parameters  which  have  been  measured  for  specific 
visual  tasks.  It  is  recommended  that  data  base  content  be  based  on  perceptual 
fidelity  as  a  goal.  However,  it  must  be  kept  in  mind  that  many  tasks  have  not 
been  studied  sufficiently  to  determine  just  what  scene  qualities  are  necessary 
to  product;  perceptual  fidelity.  In  cases  such  as  NOE  navigation  it  may  be 
necessary  to  have  all  of  the  visual  fidelity  of  the  real  world  simply  be¬ 
cause  the  trade-offs  are  unknown. 


Hamilton,  H.,  "Feasibility  Study  for  Simulation  of  an  Airport  Tower 
Control  Environment",  ADA051174,  February  1978. 

t4Wood,  M.,  "The  Fidelity  Issue  in  Visual  Simulation",  in  Proc.  of  the  1977 
Image  Conference,  Williams  AFB,  pp.  291-295,  May  1977. 

25 

Welch,  B.,  "Recent  Advances  in  Television  Visual  Systems",  in  AGARD  Conference 
Proceedings  #249,  ADA063850,  pp.  13-1,  13-7,  April  1978. 

25 

Gibson,  J.,  "The  Perception  of  the  Visual  World",  Houghton-Mifflin  Company, 
Boston,  1950. 

2 1 

Bunker,  W.,  "Training  Effectiveness  Versus  Simulation  Realism",  SPIE,  Vol. 

162,  Visual  Simulation  and  Image  Realism,  pp.  76-82,  August  1978. 
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VISUAL  CAPABILITIES 

The  performance  capability  of  the  eye  has  been  extensively  studied  anc 
reviewed  many  times  and  reported  elsewhere  (Booth*8,  Carel29,  Farrell^O). 

In  order  to  demonstrate  the  magnitude  of  the  problem  in  trying  to  replicate 
the  visual  environment  a  brief  description  of  some  of  the  capabilities  of  1  he 
eye  are  summarized  in  the  following  paragraphs. 

Acuity.  Acuity  is  defined  as  the  reciprocal  of  the  angle,  measured  in 
arc  minutes,  of  the  smallest  detail  which  can  be  resolved.  Acuity  varies 
with  luminance,  color,  contrast,  and  position  in  the  field  of  view  (LeGrancl31). 
For  high  contrast  targets,  viewed  on-axis,  the  minimum  separable  acuity  at 
10  FTL  (Foot  Lamberts)  is  2.0.  This  corresponds  to  a  bar  target  with  an 
angular  frequency  of  one  line  pair  per  arc  minute.  Vernier  acuity,  which  is 
the  ability  to  see  a  misalignment  in  a  line,  and  stereo  acuity,  which  is  the 
ability  to  see  the  angular  disparity  due  to  the  eye  separation  distance,  are 
both  approximately  0.04  arc  minutes.  The  minimum  perceptible  angular  subtense 
of  a  non  luminous  detail  is  approximately  0.007  arc  minutes. 

The  above  acuity  thresholds  can  be  combined  with  the  closest  approach 
distance  to  be  simulated  to  give  an  idea  of  the  size  of  details  which  the 
eye  is  capable  of  seeing  in  the  real  world.  The  minimum  perceptible  acuity 
criteria  allows  power  lines  to  be  seen  against  a  uniform  sky.  Under  ideal 
conditions  a  power  line  only  a  half  inch  thick  can  be  seen  at  a  range  of 
throe  miles.  At  a  range  of  5  meters  a  spider  web  strand  only  10  microns 
thick  can  be  seen.  Vernier  acuity  thresholds  indicate  that  breaks  in  edges 
due  to  misalignment  of  two  juxtaposed  displays  can  be  seen  with  misalignments 
as  ;mall  as  34  microns  on  a  screen  located  3  meters  from  the  observer. 

Steoeo  acuity  becomes  important  in  a  stereo  display  system  in  which  separate 
displays  are  computed  for  the  viewpoint  of  each  eye.  This  has  implications 
on  the  precision  with  which  a  viewpoint  is  located  for  scene  computation. 

For  example,  to  replicate  the  stereo  capability  of  an  observer  viewing  an 
objpct  located  5  meters  away,  the  viewpoint  positions  must  be  precise  to  a 
linear  dimension  of  34  microns  in  real  world  coordinates. 


28 

Bioth,  J.  and  Farrell,  R.,  "Overview  of  Human  Engineering  Considerations 
for  Electro-Optical  Displays",  SPIE,  Vol.  199,  pp.  78-108,  August  1979. 

29 

Cirel,  W.;  Herman,  J.;  and  Olzak,  L.,  "Design  Criteria  for  Imaging  Sensor 
Displays",  ADA055411,  May  1978. 

30 

Firrell,  R.  and  Booth,  0.,  "Design  Handbook  for  Imagery  Interpretation 
Eiuipment",  Boeing  Aerospace  Company,  Seattle,  Washingion,  December  1975. 

31 

L^Graid,  Y.,  "Form  and  Space  Vision",  Indiana  University  Press,  Bloomington, 
1167. 
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The  minimum  separable  acuity  threshold  is  the  one  most  often  used  as 
the  ultimate  goal  in  a  visual  display  system.  As  evidenced  by  the  above 
discussions  a  data  base  minimum  detail  dimension  criteria  based  on  a  minimum 
separable  acuity  threshold  would  not  replicate  the  potentially  visible 
environment.  The  minimum  separable  threshold  applies  to  a  large  percentage 
of,  but  not  all,  visual  tasks.  Minimum  separable  acuity  is  that  visual 
performance  parameter  which  is  used  to  read  the  letters  in  an  eye  chart. 

For  example,  20/20  vision  as  measured  on  a  Snellen  chart  corresponds  to  a 
separable  acuity  of  1.0  or  a  resolution  of  2  arc  minutes/line  pair.  A  person 
with  20/20  vision  is  capable  of  reading  letters  whose  lines  or  gaps  subtend 
1  arc  minute  or  approximately  1/16  inch  at  20  feet. 

Lumi nance ,  The  range  of  light  levels  to  which  the  eye  can  respond  ex¬ 
tends  from  10”6  FTL  to  104  FTL  or  approximately  10  orders  of  magnitude.  However, 
at  any  one  time  the  eye  is  limited  to  approximately  two  orders  of  magnitude 
of  luminance  discrimination  due  to  the  brightness  -  adaption  mechanism  of  the 
e/e  (Cornsweet-*2).  Consider  a  sunlit  environment.  The  adaptation  level  adjusts 
to  its  maximum  range.  All  luminances  above  104  FTL  are  seen  as  white;  all 
luminances  below  102  FTL  are  seen  as  black.  Now  consider  a  dark  interior  or 
an  overcast  night.  The  eye  adapts  to  its  minimum  range.  All  luminances  below 
10~6  FTL  are  black  while  all  above  10-4  FTL  are  white  (assuming  the  eye  is  not 
allowed  to  adapt  to  luminances  higher  than  10"4  FTL).  Since  display  systems 
typically  are  restricted  to  a  dynamic  range  of  100:1,  or  less,  CIG  systems  have 
generally  computed  display  information  over  this  same  range.  If,  however, 
visual  environment  simulations  are  to  include  the  effects  of  adaptation  to 
different  luminance  levels,  while  maintaining  the  dynamic  range,  then  the 
computation  of  pixel  luminance  in  the  display  should  be  carried  out  over  the 
entire  range  of  luminances  consistent  with  the  dynamic  scenario.  For  example, 
consider  a  battlefield  scenario  on  a  cloudy,  moonless  night.  The  display 
s/stem  has  a  highlight  brightness  of  10  FTL  and  a  dark  level  of  0.1  FTL. 

Tie  simulated  scene  has  absolute  luminance  levels  extending  from  10'6  to 
1J-4  FTL  which  are  effectively  simulated  by  the  display  which  calls  10-4  FTL 
white  and  displays  at  10  FTL  while  10"®  FTL  is  called  black  and  displayed  as 
0.1  FTL.  (For  illustration,  contrast  effects  have  been  ignored).  A  "white" 
object  is  seen  against  a  "black"  treeline.  Now  a  parachute  flare  ignites 
behind  the  treeline  with  the  "white"  object  in  shadow.  In  the  real  world  the 
eye  would  adapt  to  the  new  luminance  level  (call  it  I04  FTL)  and  the  previously 
"white"  object  would  appear  black  while  the  tops  of  the  trees  which  were  black 
are  now  illuminated  by  the  flare  and  appear  white.  If  the  dynamic  range  of 
luminance  computation  is  restricted  to  two  orders  of  magnitude  this  situation 
could  not  be  effectively  simulated.  The  same  reasoning  applies  to  less  extreme 
(xamples  such  as  a  pop-up  maneuver  from  a  small  clearing  in  a  dense  forest  or 
1  he  effect  of  headlights  or  search  lights.  Note  that  the  display  dynamic 
i ange  is  not  at  issue,  just  the  computational  luminance  range. 


•  ? 

Cornsweet,  T.,  "Visual  Perception",  Academic  Press,  New  York,  1970. 
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Contrast.  The  perception  of  luminance  differences  is  a  function  of  color 
and  luminance  level.  The  problem  of  modeling  observable  color  differences 
is  complex  and  beyond  the  scop^of  this  report.  The  interested  reader  is 
referred  to  MacAdam33  and  Hunt34.  Contrast  sensitivities  to  luminance  level 
differences  can  be  measured  by  observing  a  uniformly  lit  screen  of  luminance 
B.  A  sharp  edged  area  within  the  screen  has  additional  luminance  of  A B. 

The  luminance  AB  is  increased  from  zero  until  it  is  just  noticeable.  The 
just  noticeable  difference  AB  is  measured  as  a  function  of  B.  The  quantity 
AB/B  is  called  the  Weber  Ratio  (Gonzalez35).  This  quantity  for  typical 
display  luminance  ranges  is  approximately  10%  at  0.1  FTL  decreasing  to 
2%  at  1  FTL  and  remaining  fairly  constant  at  2%  to  10  FTL. 

In  terms  of  absolute  luminance  levels  the  Weber  fraction  increases  to  10 
at  luminance  levels  of  10"3  FTL  or  less  allowing  the  discrimination  of  only 
two  or  three  gray  levels.  To  more  accurately  simulate  the  situation  described 
above,  the  "white"  object  might  be  assigned  a  display  luminance  value  of  5.1 
FTL  while  the  black  trees  are  displayed  at  5  FTL. 

Note  that  luminance  difference  thresholds  are  a  function  of  luminance 
level.  Since  CIG  systems  treat  luminances  in  a  linear,  digitized  fashion  for 
computational  purposes  the  computations  are  carried  out  with  fixed  luminance 
differences.  If  the  appearance  of  the  resultant  display  is  to  replicate  the 
eye's  capability  then  the  fixed  luminance  difference  should  be  equal  to  the 
smallest  luminance  difference  observable.  This  would  lead  to  luminance  steps 
of  0.01  FTL  or  1000  steps  to  span  the  display  range  of  0.1  to  10.0  FTL.  In 
practice  luminance  computations  carried  to  8  bit  accuracy  (256  steps)  are 
usually  acceptable.  If  the  entire  dynamic  range  of  eye  perceivable  luminance 
levels  is  to  be  simulated  (as  discussed  above)  then  the  smallest  perceptible 
luminance  level  is  approximately  10~4  FTL  requiring  10°  steps  and  20  bit 
accuracy. 

In  a  color  display  formed  by  the  combination  of  three  separately  modulated 
colors  the  above  analysis  is  applicable  with  some  modification.  A  predominantly 
red  color  can  be  distinguished  from  another  predominantly  red  color  at  the 
same  luminance  level  with  a  change  in  the  red  component  of  the  order  of  2%. 
However,  a  predominantly  blue  color  needs  a  larger  relative  change  in  the  red 
component  to  be  distinguished. 


MacAdam,  D.,  "Visual  Sensitivities  to  Color  Differences  in  Daylight", 
Journal  of  the  Optical  Society  of  America,  Vol.  32,  No.  5,  pp.  247-274, 
May  1942. 


^4Hunt,  R.,  "The  Reproduction  of  Colour", 


Fountain  Press,  England,  1975. 


35 

Gonzelez,  R.  and  Wintz,  P.,  "Digital  Image  Processing", 
Publishing  Co.,  Reading,  Massachusetts,  1977. 
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VISUAL  TASK  PERFORMANCE 

The  problem  of  developing  a  visual  simulation  system  which  provides 
imagery  indistinguishable  from  the  real  world  is  not  a  valid  goal  for  training. 
The  goal,  as  stated  previously,  is  to  provide  an  environment  in  which  visual 
skills  can  be  learned.  In  the  following  paragraphs  some  data  on  the  visual 
information  required  to  perform  certain  visual  tasks  will  be  described. 

Shape  Recognition.  LeGrand36  gives  criteria  for  recognizing  geometric 
forms  as  3  arc  minutes  for  the  length  of  the  sides  of  a  triangle;  4  arc  minutes 
for  the  sides  of  a  square;  4  arc  minutes  for  the  diameter  of  a  circle;  and  a 
1%  difference  in  axis  length  for  distinguishing  a  circle  from  an  ellipse. 

Color.  The  requirement  for  color  versus  monochrome  displays  in  a  visual 
simulation  system  has  not  been  experimentally  verified.  Target  detection 
experiments  (Wagner37)  indicate  that  color  is  better  but  not  significantly. 

For  visual  search  and  identification  tasks,  Christ38  has  found  that 
there  is  no  particular  advantage  or  disadvantage  as  measured  by  task  performance 
for  many  tasks.  However,  he  found  that  for  some  tasks  color  could  be  very 
effective  under  certain  conditions.  Ali3^  describes  a  color-based  computer 
analysis  of  aerial  photographs  in  which  color  not  only  provides  an  identifying 
feature  with  which  a  particular  object  can  be  recognized  by  the  machine,  but 
also  provides  a  basis  for  the  initial  separation  of  the  individual  objects 
in  the  perception  of  the  scene. 

Although  color  has  not  yet  been  demonstrated  to  be  necessary  in  visual 
simulation  for  training  it  is  usually  one  of  the  items  specified  as  desirable 


OC 

LeGrand,  Y.,  "Form  and  Space  Vision",  Indiana  University  Press,  Bloomington, 
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Wagner,  D.,  "Target  Detection  With  Color  Versus  Black  and  White  Television", 
Report  TP5731,  Naval  Weapons  Center,  China  Lake,  CA,  April  1975. 

Christ,  R. ,  "Four  Years  of  Color  Research  for  Visual  Displays",  in  Proc. 
of  Human  Factors  Society  -  21st  Annual  Meeting,  pp.  319-321,  October  1977. 
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by  the  trainees  {Rivers40,  Chase4!).  McGrath42  provides  a  rationale  for  color 
simulation  based  on  pilot  training  objectives  and  various  mission  tasks  in 
terrain  flight. 

Gray  Levels.  Mezrich43  has  developed  a  vision  model  to  compute  the  number 
of  just  noticeable  differences  in  display  perception.  He  states  that  6  bits 
suffice  for  a  10  FTL  display.  Another  interesting  parameter  described  in 
his  report  is  that  the  contrast  sensitivity  peaks  at  3  cycles/degree  as  seen 
by  the  observer.  He  also  found  that  the  power  spectrum  of  natural  scenes 
could  generally  be  described  by  5  bits  of  luminance  and  that  the  perceived 
information  capacity  of  a  color  display  is  more  than  a  monochrome  for  the  same 
bandwidth. 

Texture.  Richards44  has  proposed  that  texture  perception  is  analogous 
to  col or  percepti on .  The  eye's  response  to  colors  can  be  explained  by  assuming 
the  presence  of  three  detectors  in  the  retina,  each  one  having  different  spec¬ 
tral  response.  Richards  proposes  and  finds  experimental  evidence  that  texture 
perception  can  be  explained  by  the  presence  of  texture  sensors  in  the  retina. 

He  has  found  that  the  texture  "primaries"  are  approximately  1,  3,  6,  and  11 
cycles/degree.  Therefore,  any  texture  can  be  simulated  by  forming  its 
texture  metamer  from  a  composition  of  these  spatial  frequencies.  Since  the 
texture  primaries  are  defined  in  terms  of  subtended  visual  angle,  the  synthesis 
of  a  given  texture  is  a  function  of  range  and  aspect  angle  of  the  textured 
surface. 

Flight  Training.  Stark45  describes  a  methodology  for  selecting  the  visual 
information  for  ClG  representation.  For  example,  air  to  air  training  tasks 
require  only  a  checkerboard  simulation  of  the  ground  to  enable  the  trainee 
to  obtain  cues  to  his  altitude,  altitude  rate,  and  ground  speed  and  highly 


^Rivers,  H.  and  VanArsdall,  R.,  "Simulator  Comparative  Evaluation",  in 
Proc.  of  10th  NTEC/ Indus try  Conference,  pp.  37-42,  November  1977. 

41 

Chase,  W.,  "Effect  of  Color  on  Pilot  Performance  and  Transfer  Functions 
Using  a  Full -Spectrum,  Calligraphic,  Color  Display  System"  in  Proceedings 
of  AIAA  Vision  Simulation  and  Motion  Conference,  April  1976. 

42 

McGrath,  J.,  "The  Use  of  Wide-Angle  Cinematic  Simulators  in  Pilot  Training", 
Technical  Report  NAVTRAEQUIPCEN  70-C-0306-1,  March  1973. 

43 

Mezrich,  J.;  Carlson,  C.;  and  Cohen,  R. ,  "Image  Descriptors  for  Display", 
Office  of  Naval  Research  Report  0NR-CR213-120-3,  February  1977. 

44 

Richards,  W.,  "Experiments  in  Texture  Perception",  ADA059630,  January  1978. 
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Stark,  E.;  Bennett,  W.;  and  Borst,  G.,  "Designing  DIG  Images  for 
Systematic  Instruction",  in  Proc.  of  10th  NTEC/Industry  Conference,  pp. 
147-155,  November  1977. 
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detailed  imagery  of  the  target  aircraft  to  make  effective  judgments  of 
range  and  attitude.  Basinger46  describes  the  qualitative  attributes  of  a 
full  mission  simulation.  Ritchie4?  emphasizes  that  perception  is  strongly 
subjective  and  highly  task  dependent  in  developing  design  criteria  for  CI6 
systems.  These  reports  point  out  the  need  for  research  to  develop  perceptual 
criteria  based  on  training  effectiveness. 

Rivers4**  describes  an  experiment  in  which  Tactical  Air  Command  (TAC)  pilots 
performed  subjective  evaluations  of  existing  flight  simulators.  Their  subjec¬ 
tive  opinion  was  that  current  systems  are  inadequate  for  air-to-surface 
tasks.  They  voiced  a  need  for:  multiple  moving  targets;  a  runway;  controlled 
ceiling  and  visibility;  adequate  gaming  area;  realistic  color;  sufficient 
scene  content  and  detail  to  determine  airspeed,  altitude,  and  area  orientation; 
visual  grayout/blackout;  sun  image;  field  of  view  equivalent  to  the  aircraft 
FOV;  and  weapons  effects. 

Kraft49  and  ChaseSO  evaluated  pilot  acceptance,  pilot  performance,  and 
training  transfer  using  CIG  imagery.  Kraft  found  that  CIG  provides  acceptable 
crew  training  for  the  approach  and  landing  task  in  commercial  aircraft. 

Chase  found  different  levels  of  pilot  performance  and  acceptability  with 
different  colors  in  a  calligraphic  display. 

Kraft5**  describes  the  results  of  a  study  to  develop  criteria  for  visual 
system  for  flight  crew  training  in  air  transports.  He  concludes  that  the 
visual  simulation  criteria  are  primarily  driven  by  equipment  limitations.  His 
recommendations  aro  a  minimum  of  6  FTL  display  luminance  (performance  drops 
off  below  6  FTL)  and  display  resolution  of  3  arc  minutes/pixel  for  daylight 
scenes. 


4 ’Basinger,  J.  and  Ingle,  S.,  "Data  Base  Requirements  for  Full  Mission 
Simulation"  in  Proceedings  of  the  1977  Image  Conference,  Air  Force,  Human 
Resources  Laboratory,  Flying  Training  Division,  Williams  AFB,  Arizona, 
pp.  25-33,  May  1977. 

47 

Richie,  M.,  "Object,  Illusion,  and  Frame  of  Reference  as  Design  Criteria 
for  Computer-Generated  Displays",  SPIE,  Vol.  162,  Visual  Simulation  and  Image 
Realism,  pp.  8-10,  August  1978. 

48 

Rivers,  H.  and  VanArsdall ,  R.,  "Simulator  Comparative  Evaluation",  in 
Proc.  of  10th  NTEC/Industry  Conference,  pp.  37-42,  November  1977. 

49 

Kraft,  C.;  Elworth,  C.;  Anderson,  C.;  and  Allsopp,  W.,  "Pilot  Acceptance 
and  Performance  Evaluation  of  Visual  Simulation",  In  Proc.  of  9th  NTEC/ 
Industry  Conference,  pp.  235-249,  November  1976. 

50 

Chase,  W.,  "Effect  of  Color  on  Pilot  Performance  and  Transfer  Functions 
Using  a  Full -Spec trim).  Calligraphic,  Color  Display  System"  In  Proceedings 
of  AIAA  Vision  Simulation  and  Motion  Conference,  April  1976. 

51Kraft,  C.  and  Shaffer,  L.,  "Visual  Criteria  for  Out  of  the  Cockpit 
Visual  Scenes",  in  AGARD  Conference  Proceedings  No.  249,  ADA063830,  pp. 
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Terrain  Flight.  Ozkaptan5^  describes  the  visual  requirements  for  nap- 
of-the-earth  flight  simulation  as:  resolution  of  3  arc  minutes/pixel; 
luminance  of  50-100  FTL:  field  of  view  of  40°  X  120°;  full  color;  simulated 
range  to  20  miles.  Key53  describes  the  visual  requirements  for  an  Army 
Rotorcraft  Research  Simulator.  He  states  that  for  an  obstacle  avoidance  task 
in  NOE  flight  a  field  of  view  of  at  least  60°  X  180°  is  required.  Resolution 
for  this  proposed  research  simulator  is  specified  as  3  arc  minutes/pixel  or 
better.  Key  states  that  objects  such  as  targets  can  be  made  artificially  large 
in  this  simulator  since  combat  simulation  for  training  is  not  the  goal. 

Sanders5^  has  experimentally  determined  that  the  task  of  navigation  during 
terrain  flight  consumes  92%  of  the  navigator's  time.  He  has  described  this 
task  as  primarily  a  correlation  task  in  which  the  navigator  first  searches 
and  then  pattern  matches.  The  navigator  correlates  his  view  of  the  terrain 
with  his  map  or  photographs,  taking  into  account  seasonal  changes,  visibility, 
illumination,  day/night  differences,  and  changes  in  fields  and  roads  since  his 
reference  information  was  obtained. 

Target  Acquisition.  This  visual  task  has  apparently  generated  the  greatest 
amount  of  perception  data  available.  Many  experimental  results  under  a  variety 
of  conditions  are  described  by  Farrell55.  The  subject  views  a  displayed  scene 
containing  a  target  and  background.  His  task  it  to  acquire  the  target.  His 
performance  is  usually  measured  as  a  function  of  display  parameters  such  as 
resolution,  contrast,  field  of  view  and  display  time.  The  performance  criteria 
is  usually  defined  in  terms  of  detection  (something  is  present  in  field); 
orientation  (where  it  is  in  field);  recognition  (recognizing  that  the  object 
belongs  to  a  class);  and  identification  (identification  of  type  within  class). 
Biberman55  gives  the  general  quantitative  resolution  requirements  in  terms  of 
the  number  of  line  pairs  subtended  by  the  minimum  critical  object  dimension 
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as:  Detection  =  1.0;  Orientation  =  1.4;  Recognition  =  4;  and  Identification 
=  6.4.  Booth5'  gives  similar  values  with  the  caveats  that  the  subtended 
visual  angle  must  exceed  12  arc  minutes  and  that  the  results  are  highly 
dependent  on  the  task  and  the  background.  Scott58  measured  identification 
only  and  scored  correct  percentage  of  responses.  His  subjects  scored  20% 
correct  at  1.5  line  Dairs  and  90%  at  7  line  pairs  per  minimum  vehicle 
dimension.  Scanlan59  measured  time  to  detect  as  a  function  of  background 
complexity.  He  found  that  detection  time  for  a  high-complexity  background 
was  24  times  that  of  a  uniform  background.  Gaven^o  measured  identification 
as  a  function  of  line  pairs  per  vehicle  and  the  number  of  gray  levels.  He 
found  that  the  number  of  quantized  gray  levels  is  inversely  proportional  to 
the  number  of  lines  per  vehicle  for  a  given  level  of  performance.  Craig61 
found  that,  for  a  given  number  of  lines  per  vehicle,  performance  improved  as 
the  field  of  view  increased  to  approximately  10°  then  leveled  off.  The  target 
size  was  a  minimum  of  30  arc  minutes. 

The  quantification  of  background  complexity  in  terms  of  perception  and 
cognition  has  been  attempted  by  Ciavarelli6^  and  Hall6^.  Until  such  a 
target-background  complexity  metric  has  been  defined  and  tested  relative  to 
performance  of  specific  visual  tasks  the  specification  and  evaluation  of 
background  complexity  will  continue  to  be  subjective. 
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Photographic  Interpretation.  Wolf®^  describes  the  basic  characteristics 
of  photographic  images  which  are  utilized  for  interpretation  as: 

a.  Shape.  This  relates  to  the  form  configuration  or  outline  of  an 
individual  object.  Shape  is  probably  the  most  important  single 
factor  in  recognizing  objects  from  their  photographic  images. 

b.  Size. 

c.  Pattern.  The  repetition  of  certain  general  forms  or  relationships 
is  characteristic  of  many  objects. 

d.  Shadows.  Shadows  in  photographs  have  two  general  effects.  They 
afford  a  profile  view  of  the  object  casting  the  shadow  and  they 
hide  objects  within  them. 

e.  Tone.  Without  tonal  differences,  shapes,  patterns,  and  texture 
could  not  be  discerned. 

f.  Texture.  This  is  the  frequency  of  tone  change  in  the  image.  Texture 
is  produced  by  an  aggregate  of  unit  features  which  may  be  too  small 
to  be  clearly  discerned. 

g.  Site.  The  location  of  an  object  in  relation  to  other  features  may 
be  very  helpful  in  identification. 

Photographic  interpretation  is  not  a  skill  to  impart  to  a  trainee  in  a 
real-time  CIG  system  but  the  general  characteristics  listed  above  probably 
correlate  well  with  the  cues  utilized  by  such  a  trainee  as  he  observes  his 
visual  environment. 

RECOMMENDATIONS 

DETAIL  SIZE 

The  amount  of  information  necessary  to  model  a  visual  environment  extends 
from  a  maximum  in  which  the  display  is  indistinguishable  from  the  real  world 
to  a  minimum  in  which  the  display  contains  just  enough  visual  cues  to  be 
perceptually  similar  for  the  specific  task  to  be  learned.  The  former  case 
can  be  calculated  from  eye  performance  measurements  and  allowed  closest 
approach.  The  size  of  the  data  base  becomes  enormous  if  the  visual  environ¬ 
ment  Is  to  appear  "realistic"  for  close  approaches  anywhere  within  the  gaming 
area.  The  latter  case  is  ideal  in  terms  of  economy  but  there  is  insufficient 
data  available  to  define  just  what  minimum  amount  of  Information  is  required 
for  all  tasks.  A  hypothesis  is  proposed  as  a  strawman  for  scene  detail  re¬ 
quirements  based  on  object  acquisition  studies  described  above. 


64Wolf,  P.,  "Elements  of  Photogrammetry" ,  McGraw-Hill,  New  York,  1974. 
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Scene  Detail  Hypothesis.  A  visual  environment  need  only  be  modeled  to 
i ;  level  of  detail  sufficient  to  identify  the  object  with  the  smallest  critical 
Minimum  dimension  for  the  particular  visual  tasks  expected  to  be  trained  In 
ihe  simulation  system. 

For  example,  if  the  scenario  involves  search  and  acquisition  of  targets 
r, d  smaller  than  a  tank  and  the  minimum  critical  dimension  of  a  tank  is  2 
naters  then  the  visual  environment  should  be  modeled  such  that  it  appears 
indistinguishable  from  the  real  world  when  seen  with  2  meters  of  object  sub¬ 
tending  7  line  pairs  of  resolution  regardless  of  closest  approach  distance. 

In  a  OIG  data  base  which  incorporates  different  models  of  the  same  object, 
tie  above  modeling  criteria  is  pertinent  to  the  highest  level  of  detail 
modeled.  In  practice  the  modeler  would  work  from  tank  photographs  whose 
resolution  is  such  that  7  line  pairs  could  be  resolved  over  a  two  meter  distance 
at  the  same  range  as  the  tank.  The  modeler  then  adds  detail  to  his  model  until 
tie  rendering  of  the  CIG  image  resembles  the  tank  image  when  they  are  both 
ooser/ed  at  the  same  size  and  resolution.  It  is  proposed  that  the  entire  data 
base  be  constructed  in  this  fashion  although  artificial  detail  at  the  7  line 
pairs/2  meter  criterion  may  be  used  as  the  highest  texture  spatial  frequency 
ii  data  base  areas  where  specific  scene  content  is  not  required. 

This  modeling  technique  would  not  appear  realistic.  For  example,  at  a 
5  meter  closest  approach  1  arc  minute  per  line  pair  eye  resolution  implies 
1400  line  pairs/2  meters.  The  tank  modeled  by  the  7  line  pair/2  meters 
criterion  would  be  devoid  of  expected  details,  however,  it  should  still  be 
capable  of  being  identified  as  a  tank  which  was  the  purpose  for  which  it  was 
intended.  The  justification  for  modeling  the  entire  gaming  area  to  this 
apparent  detail  is  to  make  the  background  scene  as  complex  as  the  smallest 
target  at  the  identification  level.  This  makes  the  entire  acquisition  sequence 
(from  detection  through  identification)  just  as  difficult  in  the  simulation 
as  in  the  real  world. 

Other  mission  scenarios  might  have  different  minimum  critical  dimensions. 
For  example,  consider  a  periscope  view  simulation.  For  identification  of  ship 
class  a  minimum  critical  dimension  might  be  50  meters  but  for  determination 
of  angle-on-the-bow  the  minimum  critical  dimension  would  be  smaller. 

DETAIL  REFLECTANCE 

Although  the  requirement  for  color  has  not  been  firmly  established,  it 
is  proposed  that  detail  spectral  reflectance  be  recorded  to  eight  bit 
precision  in  red,  green,  and  blue  primaries. 

ENVIRONMENT  CONTENT 

The  choice  of  which  objects  should  be  Included  in  the  simulated  environ¬ 
ment  is  somewhat  subjective  and  task  dependent.  For  example,  a  navigator 
would  expect  an  environment  to  contain  objects  or  features  which  are  desig¬ 
nated  on  the  map  he  is  using  to  navigate. 
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SECTION  III 
DATA  BASE  STRUCTURES 


INTRODUCTION 

This  Section  investigates  the  form  of  the  representation  of  the  environ¬ 
ment  which  is  operated  on  by  the  image  generator  to  produce  the  visual  display. 
Each  representation  class  has  its  own  advantages  and  disadvantages,  which  are 
strongly  dependent  on  the  class  of  objects  or  surfaces  to  be  modeled.  Before 
proceeding  further,  a  distinction  should  be  made  between  modeling  and  design¬ 
ing  in  the  context  of  this  report.  Modeling  is  defined  as  generating  a 
mathematical  description  of  a  real  world  environment.  This  is  essentially  a 
copying  process.  Designing,  on  the  other  hand,  involves  the  subjective  crea¬ 
tion  of  a  mathematical  description.  Modeling  involves  analyzing  real  world 
objects  in  terms  of  the  chosen  environment  representation  whereas  designing 
involves  synthesizing  simulated  real  world  objects  using  the  chosen  environment 
representation.  Modeling  does  not  require  any  Intelligence  or  decision  making 
and  is  highly  amenable  to  automation. 

Brown®5  describes  the  three  basic  problems  of  making  a  mathematical 
representation  of  physical  solids,  these  are:  (a)  obtaining  the  raw  data  or 
physical  measurements  of  the  object;  (b)  representing  the  object  description 
in  a  concise  form;  and  (c)  using  the  representation  to  render  a  display.  The 
choice  of  representation  is  driven  by  both  the  means  for  obtaining  the  raw 
data  and  the  means  for  rendering.  There  is  no  best  representation  which  will 
be  capable  of  efficiently  accepting  any  form  of  raw  data  and  efficiently 
rendering  any  type  of  object.  B1 inn®"  categorizes  the  most  commonly  used 
three-dimensional  surface  representation  as;  algebraic,  point  set,  and  para¬ 
metric.  Algebraic  functions  can  be  used  to  describe  a  limited  number  of  object 
classes.  The  data  stored  in  this  case  is  the  type  of  function,  the  coefficients 
which  control  it,  and  the  region  of  the  environment  for  which  it  is  valid. 

Point  set  representations  are  the  class  to  which  current  CI6  data  bases  belong. 
The  data  stored  in  a  current  CI6  data  base  are  the  three-dimensional  locations 
of  points  (vertices)  together  with  information  concerning  which  points  make  up 
edges  or  lines,  which  edges  make  up  polygons,  and  which  polygons  make  up 
polyhedrons.  This  type  of  representation  is  best  suited  or  most  efficient  for 
the  representation  of  objects  which  have  planar  faces.  The  point  set  surface 
description  class  also  includes  those  data  base  forms  in  which  the  surface  to 
be  modeled  is  sampled  on  a  regular  grid.  In  such  a  data  base,  two  of  the 
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(ED.),  The  Department  of  Computer  and  Information  Science,  University  of 
Pennsylvania,  May  1979. 
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t  tree  dimensions  of  the  vertex  point  locations  are  specified  by  memory  loca- 
t ion.  The  third  class  of  surface  description  is  the  parametric  representation, 
■n  this  representation  the  surface  is  divided  into  a  regular  or  irregular  mesh 
of  patches.  The  surface  shape  within  a  patch  is  then  specified  by  an  algebraic 
unction  of  parameters  which  are  chosen  for  their  convenient  behavior  within 
he  patch  boundaries.  In  the  case  of  the  parametric  representation,  the  data 
•ase  would  contain  the  location  of  the  patch  (in  world  coordinates)  and  the 
:oefficients  of  the  parametric  equation  describing  the  patch  shape.  The 
idelity  of  the  model  to  the  surface  being  modeled  is  a  function  of  the  degree 
if  the  parametric  function  used  (e.g.  cubic,  quadratic,  quintic,  etc.),  the 
ize  of  the  patch,  and  the  desired  patch  to  patch  continuity.  An  alternative 
'iata  base  format  for  parametric  patch  representation  is  the  storage  of  three 
dimensional  locations  of  control  points  which  have  the  property  of  containing 
lie  information  necessary  to  generate  the  parametric  surfaces  as  the  model  is 
t  enderad. 

Volume  representations  form  another  class  of  three-dimensional  models, 
bject;  are  stored  in  the  data  base  as  conglomerations  of  primitive  volume 
lemen^s.  The  data  base  would  include  an  object  location  and  a  listing  of  the 
type  and  relative  location  of  the  various  volume  elements  required  to  render 
the  object. 

Higher  order  environment  models  include  semantic  models  in  which  a  data 
; ase  entry  might  consist  of  an  object  name  and  its  location. 

Each  of  the  above  data  base  structures  requires  increasing  complexity 
i  f  the  CI6  processing  system  to  render  a  display,  as  the  structure  class 
( roceeds  from  algebraic,  point  set,  and  parametric  surfaces  to  volume  and 
semantic  representations. 

Some  effort  has  been  devoted  to  standardization  of  graphics  systems. 
Eergeron67  states  that  lack  of  standardization  is  the  most  serious  obstacle 
to  the  widespread  application  of  computer  graphics.  The  Association  for 
Computing  Machinery  is  currently  putting  together  a  proposed  standard  for 
graphics  (GSPC68).  The  only  representation  which  was  recommended  to  be 
a  standard  by  the  ACM  Committee  was  the  polygon  made  up  of  the  three-dimen¬ 
sional  coordinates  of  each  of  its  vertices.  There  was  no  support  given  to 
the  standardization  of  other  than  polygon  representations  .  .  .  "at  this 
time,  since  current  systems  are  too  diverse." 


cj 
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Clark6^  discusses  desired  attribute  of  a  data  base  other  than  its  structure, 
namely,  a  hierarchy  of  models  having  various  levels  of  detail.  Such  hierarchal 
data  bases  have  been  implemented  in  CI6  systems  where  it  would  be  inefficient 
to  operate  on  a  data  base  which  uses  models  at  a  high  level  of  detail  regard¬ 
less  of  the  simulated  range.  Thomason?0  applies  this  concept  to  a  relational 
data  base. 

This  section  is  only  concerned  with  the  types  of  representation  used  in 
computer  graphics.  Section  IV  will  describe  techniques  for  generating  the 
data  to  make  the  model. 

ALGEBRAIC  SURFACES 

A  surface  described  solely  by  algebraic  functions  may  potentially 
stretch  to  infinity.  The  degree  of  complexity  of  the  surface  is  dependent  on 
the  complexity  of  the  functions  used  to  describe  it.  The  higher  the  complexity 
of  the  functions  the  more  difficult  it  is  to  render  the  model  into  a  display. 
Planes  are  modeled  by  linear  functions.  In  a  rectangular  coordinate  system 
the  general  form  for  the  equation  of  a  plane  is  given  by  equation  3-1. 

3-1  Ax  +  By  +  Cz  +  D  =  0 

The  specification  of  the  coefficients  A,  B,  C,  and  D  is  sufficient  to  describe 
a  model  of  a  plane  surface.  Since  plane  surfaces  in  the  real  world  do  not 
stretch  to  infinity  more  information  is  required  to  model  real  world  plane 
surfaces.  This  information  can  be  in  the  data  base  or  can  be  computed  in  the 
rendering  process.  For  example,  if  the  real  world  surface  consists  of  two 
planes,  the  data  base  can  specify  the  boundary  line  beyond  which  one  of  the 
planes  is  valid  or  the  processing  can  determine  the  boundary  by  computing  the 
line  describing  the  intersection  of  the  two  planes. 

The  next  degree  of  surface  complexity  which  can  be  represented  by  algebraic 
functions  are  second  degree  polynomials  of  the  form  given  by  equation  3-2. 

3-2  Ax2  +  By2  +  Cz2  +  Dxy  +  Exz  +  Fyz  +  Gx  +  Hy  +  Jz  +  K  =  0 

The  types  of  surfaces  capable  of  being  modeled  by  this  general  equation  are 
cylindrical  surfaces  (functions  of  just  two  of  the  three  variables),  conical 
surfaces  (homogenous  equations  in  the  variables  x,  y,  and  z),  spheres, 
ellipsoids,  hyperboloids,  elliptic  paraboloids,  and  hyperbolic  paraboloids. 

These  surfaces  and  their  various  permutations  make  up  the  family  of  seventeen 


69Clark,  J.,  "Designing  Surfaces  in  3-D",  Conm.  of  ACM,  Vol.  19,  No.  8, 
pp.  454-460,  August  1976. 

70Thomason,  M. ,  "Applications  of  Probalistic  Information  Theory  to  Relational 
Data  Bases",  SPIE,  Vol.  186,  pp.  224-229. 
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quadric  surfaces.  The  modeling  of  surfaces  in  quadric  and  linear  polynomials 
has  been  accomplished  in  non  real  time  image  generation  systems  for  simulation 
of  real  world  scenes  (Gardner71 ,  72,  Yan73,  and  Levin74). 

The  advantage  of  utilizing  quadric  models  is  the  efficiency  with  which 
surfaces  such  as  spheres,  ellipsoids,  etc.  can  be  stored  in  the  data  base. 

The  disadvantages  of  such  representations  are:  (a)  the  complexity  of  the 
surface  intersections  which  must  be  stored  or  computed  (the  intersection  of 
two  qiadric  surfaces  is  a  fourth  degree  polynomial  in  the  general  case);  and 
(b)  the  surfaces  modeled  are  restricted  to  the  seventeen  quadric  surfaces. 

The  representation  of  surfaces  by  equations  of  higher  degree  is  potentially 
possible  but  difficult  to  implement  due  to  the  complexity  involved. 

P)INT  SET  SURFACES 

Polygons.  In  a  point  set  surface  representation  the  basic  information 
stored  in  the  data  base  is  the  three-dimensional  location  of  points.  All  current 
real-time  CIG  systems  employ  point  set  surfaces  as  the  preferred  data  base 
representation  for  modeling  arbitrarily  shaped  real  world  objects.  The  specific 
type  of  point  set  representation  used  in  these  systems  is  one  in  which  points 
are  grouped  to  define  edges,  polygons  and  polyhedrons.  The  data  base  con¬ 
structed  for  such  CIG  systems  must  conform  to  specific  modeling  rules  imposed 
by  the  processing  capabilities  of  the  real-time  hardware.  Morland'5  describes 
the  real-time  CIG  system  at  NAVTRAEQUIPCEN  and  the  modeling  rules  which  the 
modeler  must  follow  if  the  environment  is  to  be  properly  rendered.  For  example; 
polygon  faces  must  be  convex,  the  vertices  making  up  the  polygon  face  must  be 
co-planar,  the  vertices  must  be  numbered  in  a  clockwise  fashion  when  viewed 


^Gardner,  G.,  "Computer  Image  Generation  System  With  Efficient  Image  Storage", 
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from  the  visible  side  of  the  face,  objects  made  u|  of  convex  polygons  must 
be  convex  polyhedrons,  objects  are  limited  to  a  maximum  of  sixteen  faces,  and 
the  number  of  edges  in  the  environment  as  well  as  the  number  of  edges  in  any 
potential  field  of  view  must  not  exceed  the  on-lire  storage  capability  or 
edce  processing  capability,  respectively,  of  the  real-time  hardware.  The 
pol/gon  class  of  surface  representation  is  most  efficient  for  modeling  real 
world  objects  composed  of  planar  surfaces.  The  me deling  of  smoothly  curved 
surfaces  is  less  efficient  with  this  technique  since  many  polygons  are  required. 
This  difficulty  is  somewhat  overcome  by  the  use  of  shading  techniques  in  the 
rendering  process  which  eliminate  the  appearance  oc  edges  on  a  polygon  model 
of  a  smoothly  curved  surface.  However,  the  silhouettes  of  such  models  will 
still  have  straight  lines. 

Many  polygon  oriented  visible  surface  algorituns  have  been  developed 
(Watkins'6,  Sutherland77)  and  implemented  in  both  'eal-time  and  non-real  time 
image  generators.  The  basic  reason  for  such  widesiread  use  of  this  particular 
surface  representation  is  the  relative  simplicity  Df  the  geometric  transforma¬ 
tions  required  for  rendering  a  display  on  a  flat  screen  such  as  a  CRT  monitor. 
This  is  summarized  as;  straight  lines  in  the  model  transform  to  straight  lines 
in  the  display.  Carlbom78  describes  the  variety  of  ways  in  which  three-dimen¬ 
sional  objects  can  be  projected  to  a  planar  display.  Polygon  based  image 
generators  are  continuously  being  refined  to  produce  higher  quality  renderings. 
The  latest  developments  involve  the  addition  of  texture  to  a  polygon  face 
(Bunked)  and  the  utilization  of  translucent  faces  (Bunker®0). 

Fixed  Grid  Arrays.  The  construction  of  a  math  model  describing  the  relief 
of  the  earth's  surface  existed  as  a  requirement  long  before  there  were  CIG 
systems.  Such  math  models  of  the  earth's  surface  are  called  digital  terrain 
models  ( DTM) .  The  users  of  DTM  have  different  requirements  for  the  form  of 
the  terrain  information.  Geomorphologists  prefer  the  DTM  to  be  a  set  of 
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contiguous  non-overlapping  polygons  restricted  to  the  horizontal  plane  whose 
boundaries  are  indicative  of  landforms.  Surveyors  prefer  the  representation 
of  the  terrain  to  be  a  polyhedral  solid  which  approximates  the  terrain  surface 
in  three  dimensions  and  adapts  in  density  and  complexity  to  the  local  topo¬ 
graphy.  The  cartographer  prefers  the  terrain  information  to  be  in  the  form 
of  lines  such  as  profiles  or  contours.  Despite  the  fact  that  none  of  the  users 
of  DIM  desire  the  terrain  model  to  be  in  the  form  of  a  regular  grid  of 
elevations,  this  is  the  form  of  terrain  model  which  is  most  widely  used  (MarkSl). 
Gridded  data  in  which  elevation  is  sampled  at  regular  increments  in  latitude 
and  longitude  is  inefficient.  In  order  to  have  sufficient  information  to 
reproduce  complex  terrain  the  increments  must  be  small  but  this  implies  a 
large  number  of  samples  even  in  areas  where  the  terrain  is  flat.  Dutton8? 
points  out  the  reasons  for  using  gridded  data  even  though  it  is  inefficient, 
('.ridded  data  is  the  easiest  to  generate  since  an  automatic  elevation  measuring 
:ystem  does  not  have  to  make  decisions  on  where  to  bound  polygons.  Gridded 
oata  is  easiest  to  transport  between  different  analysis  systems;  to  compare 
one  set  of  data  with  another;  to  display;  and  to  conceptualize.  Other  advan¬ 
tages  for  graphics  applications  include:  data  access  need  not  be  global;  and 
overlays  can  be  accomplished  with  limited  computational  memory  and  inexpensive 
algorithms.  For  reasons  such  as  these  the  Defense  Mapping  Agency,  which  is 
assigned  overall  responsibility  of  mapping,  charting  and  geodetic  resources 
in  the  Department  of  Defense,  chose  to  use  a  regular  grid  representation  for 
terrain  elevation  data  (DMA88).  A  description  of  the  DMA  data  base  and  an 
evaluation  of  its  application  to  radar  display  simulation  is  given  by  Hoog84 
and  [efense  Mapping  Agency  Aerospace  Center  (DMAAC85).  Salmen®6  surveys, 
assesses,  and  compares  54  existing  computer  software  systems  and  geographic 
data  bases.  This  report  is  indicative  of  the  non-standardization  of  applica¬ 
tion  programs  which  indicates  the  reason  for  desiring  a  data  base  form  which 
is  easily  transportable. 
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Many  processing  algorithms  have  been  developed  which  utilize  gridded  data 
directly  in  creating  imagery.  Strat87  utilizes  an  algorithm  which  takes  a 
grid  of  elevation  data  and  displays  perspective  or  orthographic  views  in  which 
pixel  intensity  is  a  function  of  surface  normal  and  a  simulated  terrain  illumin¬ 
ation  direction.  Unruh88  and  Schachter89  describe  algorithms  in  which  both 
elevation  and  spectral  reflectance  values  in  the  grid  data  base  are  utilized 
to  produce  displays.  Faintich98  describes  capabilities  for  generating  displays 
in  which  elevation  is  a  function  of  gray  level,  contoured  displays,  shaded 
relief  displays,  and  stereo  displays.  Dungan9^  describes  an  algorithm  imple¬ 
mentation  which  can  have  several  surfaces  in  gridded  data  format.  Thus,  visual 
effects  such  as  clouds  or  haze  can  be  generated  from  a  grid  data  base. 

Algorithms  which  transform  gridded  data  into  more  efficient  forms  are 
described  in  Section  4. 

PARAMETRIC  SURFACES 

This  class  of  surface  modeling  divides  the  surface  into  patches  whose 
location  is  specified  in  world  coordinates.  Within  each  patch  the  surface 
variation  is  described  in  terms  of  parametric  functions  chosen  for  their  ability 
to  efficiently  model  the  surface  within  the  patch.  Planar  patches  are  identical 
to  the  polygon  point  set  representation.  Quadric  patches  utilize  quadric 
algebraic  functions  within  a  patch.  The  next  level  of  surface  complexity 
within  a  patch  is  that  described  by  bicubic  functions  and  so  on.  Forrest9^ 
gives  a  good  summary  of  the  various  patch  modeling  and  designing  techniques. 
Depending  on  the  specific  patch  technique  used,  the  data  base  will  contain 
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coefficients  of  the  function  within  a  patch  or  the  location  of  control  points 
which  can  be  used  to  generate  the  proper  surface  shape.  Some  techniques 
utilize  control  points  which  are  on  the  surface  while  others  use  control 
points  which  are  remote  from  the  surface.  Brewer93  describes  patches  which 
can  be  constructed  from  points  on  the  surface.  Quadric  patches  do  not  have 
enough  degrees  of  freedom  to  satisfy  slope  continuity  between  patches  for 
arbitrary  surfaces  but  can  be  used  where  such  continuity  is  not  required. 

Mahl94  describes  algorithms  for  displaying  surfaces  made  up  of  quadric  patches. 
Algorithms  for  bicubic  patches  (Catmull95,  96,  Hosaka9?)  and  biquintic  patches 
(Munchmeyer9®)  have  also  been  developed  for  displaying  such  surfaces.  Wu" 
describes  a  technique  for  storing  surface  data  as  sectional  curves  (two- 
dimensional  profiles  of  the  surface  sliced  into  parallel  sections).  His 
algorithm  for  displaying  such  a  surface  made  up  of  B-spline  functions  inter¬ 
polates  between  sections  using  cardinal  spline  functions,  BlinnlOO*  101  ^as 
developed  a  technique  for  applying  texture  to  parametric  patches  which  yields 
very  impressive  imagery.  All  of  the  patch  display  algorithms  have  one  major 
drawback  at  this  time:  they  are  too  computationally  expensive  to  operate  on 
a  complex  scene  in  real-time. 
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VOLUME  REPRESENTATIONS 

Modeling  solid  objects  can  also  be  accomplished  by  representations  in 
which  the  data  base  describes  the  objects  as  compositions  of  primitive  solid 
building  blocks.  The  simplest  form  of  such  a  data  base  is  a  three-dimensional 
rectangular  fixed  grid  of  volume  elements  (Herman*°2).  Braid*°3  models  objects 
as  additions  and  subtractions  of  primitive  solids  such  as  cubes,  wedges,  and 
cylinders.  Sorokal04  uses  generalized  cylinders  as  primitives.  A  generalized 
cylinder  is  stored  in  the  data  base  as  a  location,  an  axis  and  a  function 
which  describes  the  cross  section  at  each  point  along  the  axis.  The  University 
of  Rochester  Production  Automation  Project  has  generated  a  significant  body 
of  literature  on  solid  modeling  systems  (Requicha*0^,  Voelckerl°6,  Brown*0'). 

Volume  representation  can  be  very  efficient  in  terms  of  data  base  storage 
requirements.  The  rendering  of  such  data  bases  into  displays  is,  in  general, 
more  complex  and  computationally  expensive  than  surface  representation. 

SEMANTIC  REPRESENTATIONS 

The  data  base  which  stores  information  in  the  form  of  a  high-level  lan¬ 
guage  is  probably  the  most  efficient  model  form.  The  words  "blue  '59  Chevy 
parked  in  front  of  a  hospital"  can  certainly  be  rendered  into  a  display  by  the 
human  brain.  The  processing  required  by  a  computer  to  produce  such  a  rendering 
from  such  information  is  difficult  to  conceive.  Semantic  models  are  useful. 
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however,  when  the  requirement  is  not  to  display  in  real-time  but  to  organize 
a  math  model  data  base  to  allow  it  to  be  intelligently  addressed  by  the 
modeler  or  processor.  (McKeown108,  AginlOS). 

CONCLUSION 

Brown110  summarizes  the  characteristics  of  a  good  modeling  system  as 
(a)  geometric  coverage  and  tolerance  (includes  the  capability  to  represent 
all  shapes  to  the  desired  accuracy),  (b)  completeness  (sufficient  information 
about  each  object  for  current  and  future  applications),  (c)  reliability  (the 
system  should  be  able  to  verify  or  guarantee  the  correctness  of  the  data  it 
contains),  and  (d)  efficiency  (the  representation  should  be  capable  of  support¬ 
ing  a  variety  of  applications  efficiently).  Brown  states  that  he  knows  of  no 
geometric  modeling  system  with  these  characteristics  although  a  half-dozen 
or  more  are  currently  under  development. 

As  far  as  current  practical  systems  are  concerned,  point  set  data  bases 
will  continue  to  dominate  CI6  model  representation.  Regular  grid  models  are 
easiest  to  generate  automatically  and  polygon  models  are  easiest  to  display 
with  currently  available  technology. 
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SECTION  IV 

DATA  ACQUISITION  AND  REDUCTION 


INTRODUCTION 

The  basic  problem  in  modeling  a  real  world  environment  is  to  transform 
real  world  data  into  a  form  or  structure  which  can  be  recognized  by  the  CIG 
system  which  uses  the  model.  The  ultimate  source  of  the  real  world  data  is 
the  real  world  but  the  data  used  by  the  modeler  can  already  have  been  trans¬ 
formed  into  a  non-CIG  model  and  the  problem  can  be  one  of  transforming  one 
representation  into  another. 

Currently,  the  environment  models  utilized  by  real-time  CIG  systems  are 
generated  by  tedious,  manpower  intensive  techniques.  The  modeler  utilizes 
data  sources  such  as  maps,  photographs,  scale  drawings,  and  blueprints.  The 
basic  information  obtained  from  these  sources  is  the  three-dimensional  location 
of  points  in  the  real  world  and  the  spectral  reflectance  properties  of  surfaces 
in  the  real  world.  Based  on  the  intended  use  and  capacity  of  the  CIG  system 
the  modeler  makes  subjective  decisions  on  which  points  and  surfaces  should  be 
included  in  the  model.  He  then  extracts  the  information  from  his  data  sources, 
puts  the  information  in  the  form  and  structure  required  by  the  CIG  system,  and 
subjectively  evaluates  the  imagery  rendered  by  the  CIG  system  using  the  model 
as  a  data  base.  Monroe111  explains  this  process  in  great  detail  as  implemented 
in  the  environmental  model  generation  of  what  is  the  largest  CIG  data  base 
existing.  In  actual  practice  many  iterations  of  the  above  process  are  required 
before  the  modeler,  the  CIG  system,  and,  possibly,  the  users  are  satisfied  with 
the  model.  As  the  capacity  of  real  time  CIG  systems  continually  grows,  the  size 
and  complexity  of  the  environmental  model  needed  to  support  the  CIG  must  grow. 
Dependence  on  the  manpower  intensive  techniques  is  Inadequate  to  support  such 
growth  in  terms  of  efficiency  and  cost.  Any  techniques  which  can  automate 
parts  of  the  modeling  process  or  reduce  the  amount  of  time  required  for  the 
modeler  to  complete  parts  of  the  process  should  improve  the  overall  efficiency 
of  the  process.  Of  course,  the  cost  of  the  automation  must  be  balanced  against 
the  cost  of  the  modeler's  time.  The  acquisition  of  position  measurements  from 
the  real  world  environment  can  be  accomplished  by  a  variety  of  techniques 
(Fuchs112).  The  most  elementary  method  is  by  direct,  manual  measurement.  With 
the  aid  of  yardsticks,  plumb  lines,  and  calipers  a  great  many  objects  can  be 
sucessfully  measured.  The  modeler  first  determines  what  he  considers  to  be 
points  of  interest  on  the  object  and  then  measures  the  coordinates  of  each  of 
these  points  from  a  common  reference  position.  The  surface  of  the  object  is 
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then  defined  as  a  topological  net  over  these  key  points.  The  resulting  model 
tends  to  be  compact  (since  the  modeler  usually  tries  to  minimize  the  number  of 
points  he  must  measure)  and  an  effective  representation  of  the  object.  This 
manual  technique  can  be  automated  to  some  degree  by  substituting  a  machine 
for  the  yardstick  and  calipers.  The  machine  can  now  perform  the  measuring 
function  and  the  modeler  need  only  designate  the  points  of  interest  and  their 
connectivity.  Vickers11*1  describes  such  a  system  in  which  a  machine  senses 
tie  three-dimensional  coordinates  of  the  tip  of  a  wand.  The  modeler  then 
touches  the  wand  to  a  point  on  the  object  and  indicates  to  the  machine  that  It 
i ;  a  point  of  interest  by  activating  a  switch  on  the  wand.  This  technique, 
like  the  purely  manual  technique,  is  time  consuming  and  not  practical  for  large 
complex  objects  or  environments.  However,  it  is  effective  for  small,  simple 
objects  as  long  as  the  machine's  "view"  of  the  wand  is  not  obstructed. 

luchs114  also  discusses  holographic  and  moire  methods  for  data  acquisition 
bit  the  most  practical  data  acquisition  systems  are  based  on  multiple  two- 
dimensional  images.  An  entire  technical  field  is  devoted  to  this  technique, 
stereo-photogrammetry.  Stereo-photogrammetry  is  based  on  the  fact  that  the 
1  ication  of  the  image  of  a  point  in  a  photograph  defines  a  line  along  which 
tie  point  must  lie  in  the  environment.  Another  photograph  containing  the  same 
piint  but  taken  from  another  position  defines  another  line.  The  location  where 
tie  two  lines  intercept  defines  the  point  location.  An  obvious  advantage  of 
tiis  approach  is  that  there  is  no  need  for  the  modeler  to  decide  on,  or  physi- 
c illy  identify,  the  points  of  interest  at  the  time  the  photographs  are  taken, 
since  any  point  visible  in  both  photographs  can  be  located  using  the  photographs 
alone. 

There  are  many  variations  of  the  stereo  technique.  Fuchs115  describes 
a  comfuter  controlled,  random  axis,  triangulating  rangefinder  with  a  mirror 
deflected  laser  and  revolving  disc  detectors.  Sutherland116  describes  the 
utilisation  of  a  large  area  digitizing  tablet  with  multiple  pens  to  designate 
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the  poirt  of  interest  in  multiple  views  of  the  same  object.  (The  multiple 
viev/s  can  extend  in  complexity  from  orthographic  drawings  to  perspective 
photographs).  Appel prefers  orthographic  projections  since  there  is  «» 
direct  correspondence  between  the  two-dimensional  view  and  two  of  the 
coordinate  axes  in  the  data  base. 

The  automation  of  stereo  techniques  requires  that  the  machine  be  capable 
of  determining  which  image  points  correspond  to  the  same  object  points  in 
multiple  views  containing  the  object.  Such  machines  are  currently  used  to 
produce  digital  terrain  models.  They  are  effective  when  the  surface  to  be 
modeled  is  sufficiently  textured  in  reflectance  and  yet  sufficiently  similar 
in  two  views  that  the  machine  can  find  the  same  point  by  correlation  techniques. 
These  sytems  fail  when  the  surface  has  large  expanses  of  the  same  reflectance 
(such  as  deserts  or  lakes)  or  sufficiently  rugged  terrain  that  the  views  are 
too  dissimilar  to  correlate,  as  the  vertical  side  of  a  building  which  is 
visible  in  one  of  the  images  but  invisible  in  the  other. 

Ever  if  correlation  techniques  can  generate  the  location  of  all  points 
in  a  scene  the  modeler  is  still  required  to  choose  the  points  he  wishes  to 
include  in  his  model  (unless  the  form  of  his  model  is  fixed  grid).  But 
techniques  are  available  to  assist  him  in  this  task.  Digital  and  optical 
image  processing  technologies  are  capable  (to  a  limited  degree)  of  analyzing 
a  scene  in  terms  of  its  natural  boundaries.  Roberts!!®  describes  all  of  the 
elements  of  this  process  but  the  hardware  sophistication  at  that  time  limited 
implementation  to  very  simple  scenes.  Most  of  the  application  of  these 
technologies  has  been  in  the  area  of  machine  recognition.  Recognition  implies 
that  the  machine  contains  a  reference  model  of  the  object  or  objects  it  is 
required  to  recognize.  The  output  of  such  a  machine  is  the  decision  as  to 
whether  a  reference  object  is  in  the  scene  as  well  as  its  location  and 
orientation  rather  than  the  geometric  description  of  an  arbitrary  object. 

Image  processing  techniques  offer  methods  for  analyzing  2-D  images  in  terms 
of  their  natural  boundaries.  Optical  image  processing  has  inherent  speed 
advantages  over  digital  image  processing.  This  has  led  to  extensive  research 
into  applications  where  speed  is  required  such  as  terminal  guidance  and 
threat  identification  (Neffl!^).  jhe  Navy  has  attempted  to  use  this 
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powerful  tool  several  times  with  little  success.  (Tr^mble^).  yatz^l 
states  that  the  optical  processing  field  better  get  moving  if  it  is  to 
compete  in  terms 'of  size,  cost,  capability  and  utility  with  digital  technology. 
Since  speed  is  not  of  prime  importance  in  modeling  an  environment  for  CIG 
systems,  the  digital  processing  technologies  have  the  advantage  and  will  be 
discussed  in  further  detail  in  this  report.  The  reader  is  referred  to 
Casasert*22  ancj  Nesterikhinl2^  f0r  further  information  on  optical  processing. 

Software  technologies  also  offer  considerable  assistance  to  the  modeler. 
Digital  data  describing  the  geometry  of  an  object  or  environment  is  already 
available  in  many  cases.  Programs  which  transform  this  available  data  into 
the  CIG  environmental  model  with  little  or  no  action  required  of  the  modeler 
can  be  implemented.  The  trend  in  both  the  civilian  (Edson124)  and  military 
(DMA125)  mapping  communities  is  to  record  cartographic  information  in  a  machine 
readable  form.  The  computer  aided  design  community,  as  part  of  the  design 
process,  records  object  and  shape  descriptions  in  maciine  readable  form. 

Besides  techniques  for  recording  real  world  data  the  modeler  can  use  software 
techniques  to  assist  in  designing  (rather  than  modeling).  For  example,  the 
modeler  can  insert  generically  similar  objects  into  toe  environment 
after  modeling  an  object  just  one  time.  This  is  called  object  instancing. 

In  the  remainder  of  this  section,  photogrammetry ,  digital  image  processing, 
artificial  intelligence,  and  software  modeling  aids  are  discussed  in  greater 
detail.  In  Section  V  the  applications  of  these  techniques  in  terms  of  specific 
modeling  tasks  are  discussed. 
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STER  '-0  PHOTOGRAMJ'  ETRY 

Basics.  Phc tog ranine try  is  the  science  of  taking  measurements  from  photo¬ 
graphs.  The  two-dimensional  location  of  an  object's  image  in  a  photograph  is 
directly  related  to  its  two-dimensional  direction  from  the  camera’s  location 
when  the  photogr<  ph  was  exposed.  Measurements  on  the  photograph  are  usually 
in  rectangular  coordinates  called  photograph  coordinates.  The  units  of  photo¬ 
graph  coordinates  are  usually  microns.  The  real  world  direction  of  the  object 
is  usually  computed  as  two  angles,  specifying  a  two-dimensional  direction.  The 
three-dimensional  location  of  an  object  can  be  determined  from  the  photographic 
coordinates  of  its  image  in  two  photographs  which  are  taken  from  two  different 
locations  (stereo-photogrammetry).  The  procedures  and  geometry  of  photogrammetry 
duplicate  the  real  world  measurements  and  computations  of  a  surveyor.  Just  as 
in  surveying,  there  is  a  variety  of  instruments  available  to  assist  In  the  data 
acquisition  and  data  reduction  process  leading  to  the  production  of  a  represen¬ 
tation  of  the  real  world  to  the  desired  degree  of  precision. 

The  best  known  application  of  photogrammetry  is  the  generation  of  maps 
and  other  cartographic  products.  In  this  application  the  photographs  are 
usually  aerial  photographs.  Aerial  photos  may  be  classed  as  vertical,  in 
which  the  optical  axis  of  camera  is  vertical  and  pointing  down;  low  oblique, 
in  which  the  optical  axis  is  deviated  from  vertical  but  the  recorded  image 
does  not  contain  the  horizon;  and  high  oblique,  in  which  the  horizon  is  contained 
in  the  recorded  image.  Most  aerial  photogrammetry  utilizes  vertical  photography. 
Terrestrial  photogrammetry  connotes  photography  in  which  the  camera  is  fixed 
to  the  ground.  If  the  optical  axis  is  perpendicular  to  the  vertical  direction 
the  photographs  are  called  horizontal  photographs. 

Single  Photo.  If  the  object  being  photographed  lies  in  a  plane  perpendi- 
cular  to  the  optical  axis  of  the  camera  and  the  camera  lens  is  free  from 
distortion,  the  positions  of  image  points  in  the  photograph  are  directly  re¬ 
lated  to  positions  on  the  object  by  a  simple  scale  factor.  Figure  1  shows  such 
a  situation. 
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The  scale  of  the  photograph  is  the  focal  length  (F)  of  the  lens  divided 
by  the  object  distance  (D),  assuming  object  distance  is  much  greater  than  the 
focal  length.  For  example,  a  vertical  aerial  photograph  taken  from  an  altitude 
of  10,000  feet  with  a  lens  of  focal  length  of  6  inches  has  i  scale  of  1:20,000. 
The  physical  distance  measured  on  the  photograph  between  image  points  a  and 
b  is  equal  to  the  physical  distance  between  A  and  B  in  the  object  plaie  multi¬ 
plied  by  the  scale  factor.  Using  the  same  example,  if  a  and  b  are  measured  to 
be  lnm  apart  in  photograph  coordinates;  A  and  B  would  be  separated  by  20  meters 
at  the  object  plane. 

Photogrammetry  usually  employs  metric  cameras  for  obtaining  photographs. 

A  metric  camera  has  reference  points,  called  fiducial  marks,  built  into  the  focal 
plane  which  allow  accurate  recovery  of  the  principal  point  of  the  photograph. 

The  principal  point  is  where  the  optical  axis  intercepts  the  image  plane 
(marked  "o"  in  Figure  1).  A  metric  camera  is  manufactured  specifically  for 
dimensional  stability.  It  is  calibrated  for  focal  length,  coordinates  of  the 
principal  point,  and  residual  lens  distortion. 

Tilt.  When  the  optical  axis  is  deviated  from  the  direction  perpendicular 
to  the  object  plane,  the  image  points  are  displaced  relative  to  their  positions 
in  the  truly  perpendicular  case.  Figure  2  shows  this  situation.  A  nominally 
vertical  aerial  photograph  is  usually  tilted  from  vertical  due  to  uncontrollable 
angular  positions  of  the  aircraft.  The  effect  of  tilt  on  image  points  can  be 
compensated  by  utilizing  control  points  to  determine  the  pointing  direction 
of  the  camera.  Tilt  can  occur  about  two  axes  (for  example,  aircraft  pitch 
and  roll  axes).  The  process  in  which  the  effects  of  camera  tilt  are  eliminated 
is  called  rectification.  Rectification  only  eliminates  image  point  displace¬ 
ments  for  object  points  in  the  assumed  object  plane. 


Figure  2.  Effect  of  Tilt  on  Image  Point  Locations 


Relief.  Although  a  plane  object  surface  can  be  assumed  in  many  cases 
such  as  aerial  photography  of  a  flat  terrain  from  a  high  altitude,  the  effect 
of  surface  relief  or  departure  of  the  surface  from  a  flat  plane  also  causes 
image  displacement. 
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Figure  3.  Effect  of  Surface  Relief 


The  image  displacement  due  to  surface  relief  allows  the  relief  to  be 
computed  if  the  distance  to  the  object  plane  is  known  and  the  position  of  the 
relief  point  desired  as  projected  into  the  object  plane  is  known.  In  Figure  3 
such  a  situation  is  pictured.  Point  C  is  the  point  whose  relief  distance  is 
desired.  Point  B  is  the  location  of  the  intercept  of  the  object  plane  generated 
by  dropping  a  perpendicular  from  C  to  the  object  plane.  Usually  the  location 
of  Point  B  is  an  unknown  and  the  single  photograph  method  of  surface  relief 
measurement  is  not  used.  Another  obvious  effect  of  surface  relief  is  that  some 
parts  of  the  surface  are  capable  of  being  hidden  by  other  parts  of  the  surface. 

A  not  so  obvious  effect  is  that  relief  displacement  will  always  be  radially 
away  from  the  point  in  the  object  plane  intercepted  by  a  perpendicular  to  the 
object  plane  dropped  from  the  camera,  regardless  of  camera  tilt.  In  the  case 
of  aerial  photography  the  point  is  called  the  nadir. 

Stereo.  The  measurement  or  computation  of  surface  relief  utilizing  two 
or  more  photographs  taken  from  different  viewing  positions  is  stereo  photo- 
grammetry.  The  difference  in  image  displacement  of  the  same  object  point  in 
the  two  photographs  is  called  parallax.  Figure  4  shows  the  geometry  involved 
for  the  situation  in  which  neither  photograph  has  tilt  (or  has  been  rectified 
to  remove  tilt)  and  the  distance  D  from  the  camera  to  the  object  plane  is  the 
same  for  both  photographs. 


Figure  4.  Stereo  Geometry 
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In  Figure  4,  B  is  the  length  of  the  baseline  or  the  distance  between  the 
camera  exposure  positions,  C  is  the  image  of  object  point  C  on  the  two  photo¬ 
graphs,  o  is  the  principal  point  on  each  photograph,  X  and  X'  are  the  distances 
measured  on  the  photographs  between  the  principal  points  in  a  direction  parallel 
to  the  baseline.  The  difference  X-X '  is  called  the  parallax  (p).  From  similar 
triangle  relationships  it  can  be  shown  that  the  relief  distance  d  =  BC  is 
given  by  equation  4-1. 

4-1.  d  =  D  - 

In  aerial  photography  the  greatest  single  obstacle  in  utilizing  the  rela¬ 
tively  simple  relationships  between  parallax  and  relief  is  the  presence  of  tilt 
in  the  photographs.  Terrestrial  photographs  are  easier  to  analyze  since  the 
location  of  the  camera  and  its  pointing  direction  can  be  precisely  measured  at 
the  time  of  exposure  using  standard  surveying  techniques. 

Other  Factors.  There  are  several  other  factors  which  can  cause  image 
displacement  besides  tilt  and  relief.  These  are:  motion  of  the  camera  relative 
to  the  object  during  an  exposure,  inherent  distortion  in  the  camera  lens, 
stability  of  the  principal  point  with  respect  to  the  fiducial  marks  or  other 
references,  departure  from  flatness  of  the  film  at  the  time  of  exposure,  thick¬ 
ness  and  resolution  of  the  film,  stability  of  the  film  against  dimensional 
changes  during  processing  and  handling,  and  atmospheric  conditions  at  the 
time  of  exposure. 

Photogrammetric  Terms.  There  are  several  other  terms  used  in  photogrammetry 
which  are  of  interest.  The  ratio  of  the  length  of  the  baseline  (distance  between 
exposure  stations)  to  the  object  distance  is  called  the  base-height  ratio.  The 
larger  the  base  height  ratio,  the  greater  the  parallax  and  the  smaller  stereo 
overlap.  In  typical  aerial  photography,  base  height  ratios  of  0.6  are  common. 

In  close  range  photogrammetry,  base  height  ratios  of  0.2  are  common.  The  instru¬ 
ment  or  method  used  to  make  photogrammetric  measurements  are  usually  classified 
by  a  quality  factor  called  the  "C"  factor.  The  "C"  factor  is  the  ratio  of 
object  distance  to  the  accuracy  with  which  relief  can  be  measured.  "C" 
factors  range  from  approximately  500  to  5000  depending  on  the  quality  of  the 
instrument  and  the  photographs.  For  example,  an  instrument  being  used  to 
generate  terrain  elevations  from  aerial  photographs  might  have  a  "C"  factor 
of  1,200.  This  indicates  that  height  of  the  terrain  could  be  measured  to 
within  10  feet  if  the  aerial  photographs  were  shot  from  an  altitude  of  12,000 
feet.  Convergent  photogramnetry  describes  photogrammetric  measurements  made 
from  photographs  taken  with  the  optical  axes  purposely  tilted.  Convergent 
photogrammetry  is  used  when  the  base  height  ratio  is  desired  to  be  large  so 
that  parallax  is  large  and  easy  to  measure  but  the  coverage  or  overlap  of  the 
photographs  is  also  desired  to  be  large.  The  overlap  area  of  two  photographs 
defines  the  area  of  the  stereo  model  within  which  parallax  can  be  determined. 

The  area  within  the  stereo  model  which  is  actually  used  for  measurements  is 
called  the  neat  model.  Photogrammetric  measurements  are  usually  made  from 
positive  transparencies  of  the  negatives  exposed  in  the  camera.  These  are 
called  diapositives. 
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Photogrammetric  Instruments.  The  determination  of  the  geometry  of  a  three- 
dimensional  object  from  photographs  utilizes  instruments  which  fall  into  two 
gemral  classes;  instrumental  and  analytic.  Although  both  classes  utilize 
instruments,  the  differences  in  the  techniques  used  to  recover  the  three- 
dimensional  description  from  the  photographs  are  different.  Both  methods  re¬ 
quire  that  a  sufficiently  dense  network  of  control  points  or  reference  points 
be  acquired  from  the  object.  Large-scale  (objects  are  large  in  photographs) 
photography  generally  requires  more  control  points  than  small-scale  photo¬ 
graphy. 

Instrumental  Vs.  Analytical  Photogranroetry.  Instrumental  photogrammetry 
is  definecTaY  the  instrumental  process  of  establishing  three-dimensional  loca¬ 
tions  of  object  points  from  visual,  spatial  models.  The  models  are  formed  in 
instruments  called  stereoplotters  which  physically  reverse  the  photograohic 
process  to  create  a  small-scale,  three-dimensionalmodel  of  .he  surface  which 
was  photographed.  Measurements  can  be  made  directly  from  the  model  without 
extensive  mathematical  computation.  Figure  5  shows  the  basic  arrangement  of 
a  projection  plotter.  The  two  diapositives  are  mounted  in  a  dual  projector. 

The  projected  images  are  viewed  in  stereo  by  an  operator. 


Figure  5.  Projection  Plotter 


In  order  to  allow  only  one  projected  image  to  be  seen  by  each  eye  of  the 
observer  the  light  is  usually  coded  in  some  way,  e.g.,  by  color  or  polarization. 
The  images  are  projected  such  that  the  stereo  model  apparently  floats  in 
space  above  the  surface  of  a  table.  In  Figure  5  an  object  A  is  recorded  at 
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different  positions  in  the  two  diapositives.  Rays  from  each  of  the  diapositives 
of  the  image  of  object  A  will  intersect  or  be  coincident  at  point  A  in  the 
stereomodel.  By  using  a  table  which  is  capable  of  measuring  X  position 
(in  this  simplified  two-dimensional  example)  and  an  elevation  stage  capable 
of  measuring  height  above  the  table  the  coordinates  of  the  point  of  coincidence 
can  be  directly  measured.  The  true  coordinates  can  then  be  found  by  simply 
multiplying  by  the  scale  of  the  stereomodel  relative  to  the  actual  object  size. 
Such  instruments  are  commercially  available  which  generate  digitized  x,  y,  2 
coordinates  of  any  point  in  the  stereo  model  within  the  limits  of  the  instru¬ 
ment.  The  limitations  imposed  by  the  use  of  such  instruments  are  dictated  by 
their  physical  construction;  e.g.,  mechanical  limits  of  stage  movements,  limits 
on  variation  of  scale,  limits  on  format  size,  etc.  Analytical  instruments 
determine  the  object  point  location  by  computation  involving  the  measured 
location  of  the  image  points  of  the  same  object  in  the  two  photographs  and  the 
various  parameters  associated  with  the  original  photography. 

The  advent  of  relatively  inexpensive  computers  has  made  analytic  photo- 
grammetry  the  preferred  method  in  recent  years.  The  fundamental  advantage  of 
analytic  photogrammetry  is  that  there  are  no  limitations  or  restrictions  on 
the  geometry  of  the  original  photography  due  to  the  geometry  of  the  measurement 
process. 

Automatic  Stereocompilation.  The  basic  parameter  desired  to  be  measured 
is  parallax.  This  requires  that  the  same  object  can  be  recognized  by  the 
instrument  in  both  photographs.  In  non-automatic  methods  the  human  operator 
performs  this  recognition  function.  Automatic  instruments,  on  the  other  hand, 
must  perform  the  recognition  function  without  the  aid  of  an  operator.  The 
technique  used  in  automatic  instruments  is  to  correlate  an  area  of  one  photo¬ 
graph  with  areas  of  the  other  photograph  until  a  correlation  peak  is  obtained. 
For  objects  which  lie  in  a  plane  or  deviate  from  a  plane  only  by  a  small 
amount,  this  technique  works  well  and  has  been  implemented  in  tne  generation 
of  terrain  relief  information  from  small-scale  aerial  pnotographs.  In  such 
photographs  the  earth's  surface  is  essentially  flat  and  the  appearance  of  an 
object  in  the  two  photographs  is  nearly  identical  allowing  a  high  correlation 
peak  to  be  found.  Problems  arise  with  this  technique  when  high  correlation 
cannot  be  found  or  cannot  be  located  to  the  desired  accuracy  in  the  photograph. 
Some  examples  of  problem  areas  are:  featureless  areas  in  the  photographs, 
such  as  featureless  plains  or  bodies  of  water  in  the  case  of  aerial  photography, 
and  areas  of  high  relief  where  the  differences  between  the  two  photographs  are 
too  great  to  correlate.  An  automatic  instrument  can  be  taught  to  overcome 
these  problems  to  some  degree.  For  example,  a  body  of  water  is  recognized  as 
such  and  assigned  the  elevation  of  its  shoreline.  The  use  of  epipolar 
geometry  also  aids  in  automatic  correlation.  Epipolar  lines  are  lines  on  the 
photographs  defined  by  the  intersection  of  the  photograph  planes  with  planes 
which  contain  both  photograph  perspective  centers  (epipolar  planes).  The 
perspective  center  corresponds  to  the  location  of  the  camera  lens  when  the 
photograph  is  made.  More  specifically,  the  perspective  center  for  the  photo¬ 
graph  image  is  the  rear  nodal  point  of  the  lens.  For  any  particular  epipolar 
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plane,  conjugate  imagery  in  the  two  photographs  appears  along  conjugate 
epipolar  lines.  By  scanning  the  photographs  along  epipolar  lines  the  image 
correlation  task  is  reduced  from  two-dimensional  to  one-dimensional .  Auto¬ 
matic  stereo  compilation  equipment  is  used  extensively  in  the  cartographic 
community.  In  general,  the  use  is  restricted  to  small-scale  aerial  photography. 
The  "C"  factor  of  such  equipments  is  in  the  range  of  5,000.  Areas  where 
correlation  is  difficult,  due  to  lack  of  features  or  ground  slopes  in  excess 
of  60°,  are  generally  beyond  the  instrument  capabilities  and  require  the 
assistance  of  a  human  operator  to  do  the  recognition  task.  Additional 
detail  on  the  subject  of  photogrammetry  may  be  obtained  from  Thompson126  and 
Wolf12'.  A  description  of  automatic  stereo  compilation  techniques  are  de¬ 
scribed  by  Helava128  who  invented  the  analytic  plotter.  Descriptions  of 
commercially  available  automatic  equipments  are  also  available  (Bendix12^, 
Abshier!30,  Kraus^l,  VanWijki32,  Allam188,  Kelly^A).  Typical  performance 
of  an  automatic  system  is  the  production  of  a  700,000  point  digital  terrain 
model  and  an  orthophoto  in  90  minutes  (Kelly135). 
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Orthophotography.  The  production  of  an  orthographic  perspective  view  from 
a  pair  of  stereo  photographs  is  called  orthophotography.  The  orthophotograph 
is  essentially  a  photograph  in  which  the  image  displacement  due  to  relief  has 
been  removed.  All  object  image  points  in  the  orthophotograph  have  photograph 
coordinates  which  directly  correspond  to  coordinates  of  the  object  in  the 
object  plane.  In  the  case  of  aerial  photography  the  object  plane  coordinates 
might  be  latitude  and  longitude.  In  this  case  relief  is  elevation  and  a 
point  on  the  orthophotograph  is  associated  with  a  single  latitude-longitude 
value  regardless  of  the  terrain  elevation  at  that  point.  Orthophotographs 
can  be  produced  opto-mechanically  or  analytically. 

Digital  Photogrammetry.  Many  digital  computational  techniques  are  being 
investigated  and  implemented  in  the  field  of  photogrammetry.  Most  of  these 
involve  the  conversion  of  a  photograph  into  a  form  recognizable  by  a  computer. 
This  is  generally  accomplished  by  quantizing  the  density  or  transmission  of 
the  photograph  at  discreet,  digitized  locations  in  photograph  coordinates. 

The  digitizing  and  quantizing  operations  will  be  discussed  in  the  section  on 
digital  image  processing.  The  resultant  digital  image  is  a  mathematical  entity 
which  can  be  manipulated  by  a  computer  (Rosenfeld188) ,  Panton186.137  describes 

a  technique  for  producing  digital  terrain  models  from  digital  images  utilizing 
epipolar  geometry  for  correlation  search  strategies.  He  also  recommends  the 
utilization  of  bi-cubic  patch  models  based  on  a  rectangular  grid  for  surface 
description.  Hunt138  presents  a  simplified  theory  of  the  relation  between 
errors  in  calculation  of  terrain  elevation  and  the  observable  parameters  in 
a  digitized  stereo  pair.  Keating13*  gives  the  computer  memory  storage  require¬ 
ments  for  a  digitized  stereo  pair  of  9"  aerial  photographs  as  5  X  10®  bits 
together  with  the  resultant  model  storage  requirement  of  108  bits.  Keating140 
also  describes  the  procedure  to  produce  an  orthophoto  from  an  unrectified  aerial 
photo  and  a  digital  terrain  model. 
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Close  Range  Photogrammetry  (CRP).  CRP  is  not  amendable  to  automatic 
operation  since  the  relief  is  large  and  corresponding  images  are  usually 
too  different  to  be  correlated.  Large  relief  also  makes  analytic  photo¬ 
grammetry  more  advantageous  than  instrumental  photogrammetry.  Karara*** 
and  Jaksicl42  describe  the  hardware  and  software  available  for  close  range 
photogrammetry.  An  interesting  application  of  CRP  is  found  in  Liebes*'*3. 

Color  Orthophotography.  Once  the  relief  of  a  surface  has  been  determined, 
any  type  of  photographic  imagery  can  be  transformed  into  an  orthophoto. 

Konecny1*-4  describes  a  procedure  for  producing  true  color  orthophotos. 

Martin*^  describes  the  production  of  false  color  orthophotos  from  visual  and 
near  IR  photos. 

Non-Metric  Cameras.  If  a  sufficient  number  of  control  points  are  known, 
the  distortions  and  lack  of  registration  information  associated  with  ordinary 
camera,  can  be  compensated  in  making  photogrammetric  measurements.  Abdel-Aziz1^6 
has  imilemented  such  an  approach  with  the  conclusion  that  photographs  taken 
with  a  $20  hand-held  camera  were  capable  of  generating  position  information 
accura  .e  to  2mm  for  an  average  object  distance  of  5  meters  ("C"  factor  of 
2,500).  His  analysis  indicates  that  a  minimum  of  6  object  control  points  are 
required  and  near  co-planar  control  points  should  be  avoided.  This  indicates 
that  with  proper  analysis  techniques  and  some  knowledge  of  the  dimensions  and 
location  of  the  object  one  can  take  any  camera  and  simply  take  several  photo¬ 
graphs  of  it  to  obtain  three-dimensional  coordinate  information. 
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FEASIBILITY 

An  evaluation  of  stereophotogranroetry  applied  to  generating  a  polygon 
model  was  made  using  a  scale  model  of  a  ship.  The  scale  model  was  photographed 
from  two  positions  using  a  metric  camera.  A  projection  stereoplotter  was 
then  used  to  digitally  determine  the  three-dimensional  coordinates  of  operator 
chosen  vertices  of  polygons.  This  data  was  then  manually  keyed  into  a  computer 
aided  design  work  station  in  the  format  required  by  the  NAVTRAEQUIPCEN  CIG 
system.  (This  step  would  not  be  necessary  if  the  stereo  analysis  equipment 
were  interfaced  directly  to  the  modeler's  display).  Since  the  ship  model  was 
symmetrical  about  a  vertical  plane,  only  one  side  had  to  be  digitized.  The 
resultant  wire  frame  model  is  shown  in  oblique  perspective  in  Figure  6. 

A  comparison  of  this  technique  to  manual  digitizing  methods  cannot  be 
made  since  the  cost  of  manually  generating  three-dimensional  vertices  has 
not  been  analyzed.  The  cost  of  the  stereo  method  can  be  estimated  as  follows 
for  a  typical  airport  environment  containing  approximately  10,000  vertices: 
Non-recurring  -  stereo-plotter  and  computer  graphics  display  $100,000; 
recurring  costs  -  aerial  photography  and  ground  control  $5,000;  plotter 
set-up  (36  stereo  pairs)  $1,000  and  digitizing  time  $5,000. 

DIGITAL  IMAGE  PROCESSING 

Basics.  A  digital  image  is  an  image  which  has  been  discretized  both 
in  spatial  coordinates  and  luminance.  A  digital  image  may  be  considered  as 
a  matrix  whose  row  and  column  indices  identify  a  point  in  the  image  and  the 
corresponding  matrix  element  value  identifies  a  gray  level  at  that  point. 

The  elements  of  such  a  digital  array  are  called  picture  elements  or  pixels 
(Gonzalezl^7).  Digital  image  processing  consists  of  mathematically  manipulating 
digital  images  to  extract  information.  The  two  principal  applications  of 
digital  image  processing  are  the  improvement  of  pictorial  information  for 
human  interpretation  and  the  processing  of  picture  data  for  autonomous  machine 
perception.  The  basic  elements  of  a  digital  image  processing  system,  utilized 
as  an  aid  to  picture  interpretation,  are  a  digitizer,  a  digital  computer,  and 
a  display.  The  digitizer  converts  the  image  into  a  machine  recognizable 
form,  the  computer  performs  the  desired  mathematical  manipulations  and  the  dis¬ 
play  converts  the  results  into  an  operator  readable  form. 

Digitizers.  The  conversion  of  a  continuous  tone  image  such  as  a  photo- 
graph  into  a  digital  image  is  accomplished  by  a  digitizer.  Digitizers  are 
most  commonly  either  scanning  microdensitometers  or  television  cameras. 
Microdensitometers  are  used  when  high  precision  is  required  and  the  input 
image  is  in  the  form  of  a  transparency.  TV  cameras  are  more  flexible  in  terms 
of  the  form  of  the  input  data,  less  precise,  and  faster  than  microdensitometers. 


147Gonzalez,  R.  and  Wintz,  P.,  "Digital  Image  Processing",  Addison -Wes ley 
Publishing  Co.,  Reading,  Massachusetts,  1977. 
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Figure  6.  Wire  Frame  Model  From  Stereophotos 
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Both  types  of  systems  use  a  photosensor  to  sense  the  light  level  from  each 
point  in  the  image.  The  light  level  is  then  assigned  a  quantized  gray  level 
representing  the  pixel.  The  gray  level  assigned  can  be  linearly  or  non- 
li nearly  related  to  the  output  of  the  photosensor.  Therefore,  some  "analog" 
processing  can  occur  before  the  picture  is  digitized.  The  effect  of  analog 
processing  on  the  image  is  similar  to  the  effect  of  changing  brightness  or 
contrast  on  a  broadcast  television  picture.  Color  digitizing  is  accomplished 
by  the  use  of  color  filters,  monochrome  color  separations,  or  color  sensitive 
photodetectors.  The  sample  or  pixel  size  to  which  the  original  imagery  is 
discretized  is  not  independent  of  the  original  imagery.  The  discrete  sampling 
of  imagery  leads  to  aliasing  effects  or  the  generation  of  spurious  spatial 
frequencies  unless  the  spatial  sampling  is  at  least  twice  the  resolution  of 
the  original  imagery  (Nyquist  criterion).  Therefore,  the  original  imagery 
digitizing  process  must  consider  aliasing  and  its  effect  on  the  desired 
result.  Typical  microdensitometers  can  digitize  a  photograph  containing  a 
quarter  of  a  billion  pixels  quantized  to  256  gray  levels  (8  bits)  in  a  few 
hours.  A  TV  digitizer  can  digitize  a  TV  frame  of  approximately  one  quarter 
million  pixels  in  a  TV  frame  time  of  1/30  second. 

Digital  Computer.  The  function  of  the  digital  computer  in  a  digital 
image  processing  system  is  to  perform  some  operation  or  operations  on  the 
mathematical  entity  which  is  the  digitized  image.  The  operations  range  from 
the  relatively  simple  operations  used  for  image  enhancement  to  extremely  com¬ 
plex  operations  used  for  pattern  recognition  and  machine  understanding  or 
artificial  intelligence.  Pratt*48  is  an  excellent  reference  for  the  various 
types  of  operations  which  are  performed  on  digital  images. 

Displays.  The  display  of  the  results  of  the  processing  operations  are 
dependent  on  the  desired  end  product.  For  example,  a  processor  might  be  an 
image  enhancer  whose  function  it  is  to  provide  a  "better"  version  of  the  in¬ 
put  image  to  the  operator.  A  more  complex  processor  might  have  the  function 
of  determining  whether  a  particular  object's  image  is  contained  in  the  input 
image  in  which  case  the  output  is  a  yes,  no,  or  maybe.  For  the  modeling 
application  of  interest,  the  display  would  most  likely  be  a  raster  scanned 
Cathode  Ray  Tube  (CRT),  preferably  a  color  shadow  mask  type,  with  some  opera¬ 
tor  interaction  capability. 

Image  Enhancement  Operations.  The  most  basic  operations  of  digital  image 
processing  are  those  which  operate  on  single  pixels  without  regard  to  the 
remainder  of  the  pixels  in  the  image. 

Histogram.  The  production  of  a  histogram  is  perhaps  the  most  basic 
operation.  A  histogram  is  simply  a  count  of  how  many  pixels  in  the  image  have 
a  particular  gray  level.  The  resultant  plot  which  may  be  displayed  to  the 


^48Pratt,  W.,  "Digital  Image  Processing",  John  Wiley  and  Sons,  New  York,  1978. 
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operator  on  a  CRT  display  or  output  in  hard  copy  from  a  plotter  would 
resemble  Figure  7. 


Figure  7.  Histogram 


Histogram  analyses  do  convey  some  information  about  a  digitized  image. 

Fot  example,  the  histogram  envelope  might  indicate  that  the  image  belongs  to 
a  certain  class  of  images  or  that  the  image  should  undergo  additional  processing, 
In  the  case  of  multispectral  images,  e.g.,  color  separations,  each  image  has  its 
owr  histogram. 

Contrast  Stretching.  The  digital  image  process  which  assigns  different 
gray  levels  to  each  pixel  so  as  to  make  optimum  use  of  the  gray  level  range  of 
the  dls olay  is  called  contrast  stretching.  For  example,  the  histogram  of  an 
image  ( »s  originally  digitized)  might  be  concentrated  in  only  a  few  adjacent 
gray  levels.  The  contrast  stretching  operation  would  reassign  the  few  adjacent 
levels  to  displayed  gray  levels  which  are  separated  to  the  gray  level  limit  of 
the  display.  Figure  8  shows  the  result  of  such  an  operation  by  comparing  the 
histogram  of  input  image  to  the  histogram  of  the  displayed  image. 


Figure  8.  Contrast  Stretching  (a)  Input  Image 

Histogram  (b)  Displayed  Image  Histogram 
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The  result  of  this  operation  is  to  enhance  the  contrast  of  the  original 
image.  Figure  8  represents  a  linear  form  of  contrast  stretching.  Non-linear 
stretching  might  also  be  used.  For  example,  the  histogram  of  the  original  image 
might  contain  two  widely  separated  peaks  with  few  pixels  having  gray  levels 
outside  these  two  regions.  In  this  case  the  maximum  utilization  of  the  displayed 
pixels  might  be  a  linear  stretch  of  the  first  peak  over  half  the  available  gray 
levels  while  the  second  peak  is  spread  over  the  other  half  of  the  available 
display  levels.  Figure  9  shows  the  histograms  of  the  input  and  output  images. 


PIXELS 


a 


Figure  9.  Non-Linear  Contrast  Stretching  (a) 

Input  Histogram,  (b)  Output  Histogram 

Another  single  pixel  operation  which  is  used  to  modify  the  histogram  of 
a  digitized  image  is  clipping.  Clipping  involves  the  clipping  off  of  the  high 
and  low  gray  levels  and  then  stretching  the  remaining  gray  levels.  Clipping  is 
used  when  there  is  little  or  no  desired  information  in  these  extreme  areas  of 
the  image  histogram.  Clipping  from  just  one  side  of  the  histogram  is  called 
thresholding. 

Operations  which  select  only  a  small  range  of  gray  levels  for  display  are 
called  gray  level  slicing.  The  display  of  a  sliced  image  can  be  stretched  to 
fill  the  available  display  gray  level  range  or  can  be  displayed  as  a  binary 
image  where  whites  or  grays  are  used  to  display  only  those  pixels  in  the  ori¬ 
ginal  image  which  are  contained  in  the  slice. 

If  the  display  is  capable  of  color,  the  digitized  monochrome  image  can  be 
pseudo  colored  to  assist  the  operator  in  extracting  information.  Three  color 
displays  are  also  required  to  simultaneously  view  a  color  image  generated  from 
three  monochrome  images.  Level  slicing  can  be  accomplished  in  each  image  in¬ 
dependently  or  by  considering  all  three  (or  more)  monochrome  digitized  images. 
These  techniques  comprise  multi-spectral  image  classification. 
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The  most  widespread  utilization  of  the  above  techniques  is  to  classify 
terrain  areas  in  aerial  or  satellite  images  according  to  surface  material.  A 
Landsat  image,  for  example,  contains  approximately  8  million  pixels,  each  of 
which  has  4  gray  levels  associated  with  apparent  brightness  in  4  different 
spectral  bands.  Each  pixel  in  each  band  is  quantized  to  a  precision  of  6  or  7 
bits.  The  total  number  of  different  values  (or  4-vectors)  which  a  pixel  can 
have  is  in  excess  of  100  million.  Jayroe149  in  analyzing  a  typical  Landsat 
image  found  that  one  third  of  the  pixels  were  unique  (there  were  no  other  pixels 
in  the  image  which  had  the  same  4-vector).  Another  eighth  of  the  pixels  were 
duplicated  once.  The  most  common  vector  was  associated  with  3,000  pixels. 

Jayroe  stated  that  some  landsat  images  such  as  those  taken  of  a  vegetated 
terrain  during  a  growing  season  had  as  many  as  99%  of  the  pixels  with  unique 
vectors.  Despite  such  classification  statistics  an  operator  can  perform  use¬ 
ful  classifications  to  a  high  degree  of  accuracy  with  sufficient  iterations  on 
the  type  of  contrast  stretching  and  level  slicing  employed.  Ground  truth 
measurements  or  the  obtaining  of  information  regarding  the  specific  material 
contained  within  a  ground  area  covered  by  a  specific  pixel  is  Important  if  the 
classification  is  to  accurately  reproduce  the  location  of  real  world  surface 
materials.  Problems  encountered  in  classification  by  histogram  analysis  are 
primarily  due  to  the  gray  level  or  multispectral  signature  (combination  of  gray 
levels  from  several  spectral  bands)  not  being  the  same  for  the  same  material 
in  different  images  (such  as  photographs  made  at  different  times  or  from 
different  locations,  or  with  different  cameras  etc.).  Even  in  a  single  photo¬ 
graph  the  same  type  of  material  may  have  different  signatures  in  different 
parts  of  the  image  (Nagaolb0). 

Noise  Cleaning.  Digital  Images  can  also  be  processed  by  considering  the 
local  neighborhood  around  a  pixel  and  modifying  the  gray  level  of  the  displayed 
pixel,  based  on  the  gray  levels  of  the  pixels  around  it.  By  appropriate  choice 
of  algorithms  the  effect  of  such  operations  could  be  to  low  pass  spatial  filter 
the  image  which  minimizes  high  spatial  frequency  noise  or  to  high  pass  filter 
the  image  and  emphasize  edges  or  high  spatial  frequency  information.  Another 
alternative  is  to  transform  the  image  into  a  sum  of  spatial  frequency  components 
and  display  the  pixels  which  have  the  particular  spatial  frequency  relations  of 
interest. 

Noise  in  an  image  is  usually  of  high  spatial  frequency.  Low  pass  filtering 
is  accomplished  by  performing  an  average  or  weighted  average  of  the  pixel  in  the 
digital  image  with  its  8  nearest  neighbors  (or  15  or  24  nearest  neighbors)  and 
assigning  the  corresponding  display  pixel  the  average  value.  For  example,  an 
average  of  a  white  noise  spot  pixel  gray  level  =  256,  in  a  black  background, 
neighboring  pixel  gray  levels  =  1,  using  an  equally  weighted  filter  over  the 


14Q 

Jayroe,  R.  and  Underwood,  D.,  "Vector  Statistics  of  LANDSAT  Imagery", 

NASA  Tech  Memo,  TM  78149,  December  1977. 

^°Nagao,  M.  et  al,  "Agricultural  Land  Use  Classification  of  Aerial 

Photographs  by  Histogram  Similarity  Method",  Proceedings  of  IEEE  Computer 
Society  Conference  on  Pattern  Recognition,  pp.  669-672,  November  1976. 


47 


NAVTRAEQU I PCEN  IH-318 


white  pixel  and  its  8  nearest  neighbors  produces  a  display  pixel  of  gray  level 
=  29.  Other  noise  cleaning  masks  might  assign  a  weight  of  2  to  the  pixel  of 
interest  and  a  weight  of  1  to  the  neighboring  pixels  or  a  weight  of  4  to  the 
pixel  of  interest  and  weights  of  2  and  1  to  its  immediate  neighbors  respectively. 
The  result  of  noise  cleaning  tends  to  lose  resolution  since  the  operation  reduces 
contrast  of  high  spatial  frequency  information  as  well  as  reducing  noise  contrast. 

Edge  Enhancement.  Edges  or  high  spatial  frequency  information  in  a  digital 
image  can  be  extracted  or  utilized  to  enhance  the  original  image.  This  is 
accomplished  by  taking  the  difference  between  a  pixel  and  its  neighbors  and 
assigning  the  gray  level  of  the  displayed  pixel  based  on  the  difference.  By 
analogy  to  the  noise  cleaning  operation  a  weighted  average  is  computed  for  each 
pixel  with  negative  weights  assigned  to  the  neighbors  while  a  positive  weight 
is  assigned  to  the  pixel  of  interest.  An  operation  like  this  can  be  utilized 
along  a  row  or  column  of  the  digital  image  matrix  and  so  extract  vertical  or 
horizontal  edges  independently.  Similarly,  by  appropriate  weighting  of  neigh¬ 
bors  edges,  running  in  any  or  all  directions  can  be  extracted.  The  operation 
which  is  concerned  with  any  difference  is  called  a  gradient  operation  (since 
the  form  of  the  displayed  image  is  approximately  the  gradient  of  the  original 
image).  Applying  the  gradient  operation  to  a  gradient  image  is  called  a 
Laplacian  operation.  Original  images  tend  to  be  subjectively  better  when  their 
own  Laplacian  is  added  to  them.  The  resultant  image  appears  crisper  to  the 
human  observer. 

Besides  improving  the  asthetic  qualities  of  an  image,  the  above  operations 

also  tend  to  simplify  more  complex  image  processing  operations  which  require 

some  machine  understanding  of  the  image  structure.  The  more  complex  operations 

are  almost  always  preceded  by  the  relatively  simple  operations  of  histogram 

modification,  noise  reduction,  and  edge  enhancement. 

Image  Restoration.  When  the  original  image  has  been  degraded  due  to  some 
known  mechanism,  the  image  can  be  restored  to  some  degree  by  processing  it  to 
remove  the  degradation  due  to  the  known  cause.  Andrews151  and  Pratt152  contain 
detailed  information  on  the  types  of  image  degradation  and  image  restoration 
operations.  Some  sources  of  image  degradation  which  are  amenable  to  restoration 
techniques  are  diffraction  in  the  optical  system,  sensor  non-linearities, 
optical  system  aberrations,  film  non-linearities,  atmospheric  turbulence,  image 
motion  blur,  geometric  distortion,  sensor  or  film  noise,  and  temporal  effects. 
Digitizing  the  original  imagery  can  also  be  a  source  for  image  degradation  which 
can  be  counted  in  the  processing. 
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In  practice  a  math  model  of  the  degradation  process  is  developed  and  a 
process  derived  from  the  model  which  will  invert  the  degradation.  It  is 
apparent  that  such  processing  can  become  extremely  complex  if  many  degradations 
are  present.  The  specific  restoration  processes  are  not  independent  and  the 
order  in  which  they  are  applied  is  important. 

The  conclusion  to  be  drawn  is  that  original  imagery  should  contain  little 
or  no  degradation  and  the  original  image  digitizing  system  should  be  designed 
to  minimize  degradation  in  the  digitizing  process.  The  assumption  that  a 
degraded  image  can  be  restored  is  optimistic  and  the  better  procedure  would 
be  to  eat  the  degraded  image  and  obtain  another  original  when  possible  and 
practical . 

Image  Understanding.  The  above  operations  require  no  understanding  on 
the  part  of  the  machine!  The  machine  just  operates  on  one  image  to  produce 
another.  The  technology  which  is  involved  with  the  design  of  machines  which 
can  extract  information  from  the  data  available,  make  inferences  as  to  the 
high-level  structure  of  the  information,  test  those  inferences,  and  learn  from 
mistakes  is  called  artificial  intelligence  (Hunt*53,  Winston154,  Jackson155). 

The  design  of  such  machines  tends  to  be  analogoif  to  mental  perception  processes. 
The  digitized  image  or  scene  is  analyzed  into  segments  or  regions  based  on 
some  parameter  such  as  gray  level  (or  color)  or  gray  level  variation  within 
the  region.  Such  scene  analysis  is  natural  to  biological  visual  perception 
processes.  Each  region  is  then  classified  as  the  Image  of  a  known  object 
(recognition)  or  as  the  image  of  a  new  object  (cognition)  or  as  irrelevant 
information.  Machines  which  are  capable  of  such  processes  have  been  implemented 
with  some  success  in  limited  applications  (Kasvand156) .  The  applications 
usually  involve  scenes  of  limited  complexity  or  classification  to  a  limited 
number  of  object  types. 

Segmentation.  The  data  in  a  digital  image  is  gray  level  or  color  for 
each  pixel .  The  information  which  is  to  be  extracted  from  this  data  can  take 
many  forms  and  so  there  are  many  varied  paths  used  to  arrive  at  the  desired 
information.  One  of  the  most  basic  operations  in  scene  analysis  is  to  segment 
the  scene  into  regions  within  which  the  gray  level  or  some  function  of  gray 
level  describes  the  pixels  to  the  desired  degree  of  fidelity.  Once  a  scene 
has  been  segmented,  the  boundaries  of  each  region  can  be  mathematically  described. 
The  boundaries  of  a  region  can  be  used  to  describe  che  snape  of  a  region  to  the 
machine.  Shape  is  a  higher  order  description  which  may  be  the  desired  informa¬ 
tion  or  may  be  the  input  into  a  recognition  process  which  matches  or  correlates 
shapes. 
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The  most  elementary  segmentation  algorithms  analyze  a  scene  by  dividing 
it  into  regions  having  the  same  gray  level.  In  the  case  of  multi  spectral 
images,  the  regions  have  the  same  color.  The  analysis  of  landsat  imagery  is 
amenable  to  this  type  of  classification  (Towles1”).  Problems  arise  when 
differences  in  gray  level  or  color  do  not  arise  from  surface  material  differences 
but  from  lighting,  sensor,  or  atmospheric  effects.  A  digital  image  segmented 
by  gray  level  into  a  binary  image  can  be  transformed  into  a  line  drawing  type 
of  image  by  a  gradient  operation.  In  all  but  the  most  simple  images,  this  will 
result  in  the  display  of  many  unconnected  lines  of  various  lengths  and  orien¬ 
tations  in  which  the  boundaries  of  the  various  scene  regions  are  imbedded. 
Techniques  for  eliminating  or  minimizing  the  extraneous  lines  include  region 
filling.  Region  filling  of  the  binary  gray  level  slice  is  accomplished  by 
changing  the  value  of  a  pixel  if  aY\_  of  its  surrounding  neighbors  have  a 
different  value.  This  is  done  prior  to  the  gradient  operation  to  remove  single 
pixel  anomalies.  Many  similar  techniques  are  utilized  in  scene  analysis 
algorithms  (Duda1^). 

Segmentation  can  be  accomplished  by  doing  a  gradient  operation  then 
growing  a  region  by  connecting  line  segments  or  edges  extracted  by  the  gradient 
operation  to  surround  regions. 

Discrimination  of  regions  by  texture  classification  is  a  more  sophisticated 
process.  Classification  of  terrain  from  aerial  photographs  has  been  studied 
by  Weszka1^.  He  found  that  the  best  measure  of  texture  was  based  on  second 
order  statistical  differences.  Tamura^O  analyzed  scenes  according  to  those 
texture  properties  which  resemble  human  perception  processes.  Segmentation  and 
texture  classification  has  also  been  described  by  Tou161  and  Haralick162  among 
many  others. 
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Once  an  image  has  been  segmented  into  meaningful  regions,  the  shape  of  the 
region  or  the  content  of  the  region  can  be  further  analyzed. 

Pattern  Recognition.  The  technology  of  automatically  recognizing  a  gray 
level  distribution  within  a  region  or  the  shape  of  a  region  as  belonging  to 
one  of  a  number  of  classes  is  called  pattern  recognition.  Once  a  scene  has 
beer  segmented,  a  variety  of  algorithms  are  available  to  assist  in  the  recognition 
process  (Agrawala16^).  Davis162*  provides  a  survey  of  techniques  used  to  find 
edges.  HueckeUbb  and  Dudaibb  describe  algorithms  for  locating  lines  and  edges 
based  on  gray  level.  Nevatia167  used  color  for  edge  detection  and  scene  seg¬ 
mentation. 

The  edges  bounding  a  region  may  be  obvious  to  a  human  operator  but  the 
machine  must  also  be  capable  of  knowing  edges  if  it  is  to  know  the  shape  of 
the  region.  Nevatial68  describes  an  algorithm  which  finds  groups  of  edges  that 
connect  in  a  straight  line  and  then  links  them  to  form  a  boundary.  Agin169 
describes  an  algorithm  which  finds  roads  in  an  aerial  photograph.  Hwang170 
uses  both  global  and  local  edge  information  to  locate  region  boundaries 
enabling  the  machine  to  interpolate  through  image  areas  where  the  edge  may  be 
hidden.  Once  the  machine  has  what  is  essentially  a  line  drawing  of  the  scene, 
it  can  proceed  with  the  recognition  of  description  process.  This  may  involve 
templet  matching  in  which  case  the  machine  has  a  stored  library  of  specific 


1  6*} 

Agrawala,  A.,  "Machine  Recognition  of  Patterns",  IEEE  Press,  John  Wiley 
and  Sons,  Inc.,  New  York,  1976. 

164Davis,  L.,  "A  Survey  of  Edge  Detection  Techniques",  Computer  Graphics 
and  Image  Processing,  Vol .  4,  pp.  248-270,  1975 

165 

Hueckel,  M. ,  "An  Operator  Which  Locates  Edges  in  Digitized  Pictures", 

Journal  of  the  Association  of  Computing  Machines,  Vol.  18,  pp.  113-125, 

January  1971. 

166Duda,  R.  0.  et  al ,  "Use  of  the  Hough  Transformation  to  Detect  Lines  and 

Curves  in  Pictures",  Communications  of  the  Association  for  Computing  Machines, 
Vol.  15,  pp.  11-15,  January  1972. 

167Nevatia,  R. ,  "A  Color  Edge  Detector  and  its  Use  in  Scene  Segmentation", 

IEEE  Trans,  on  Systems,  Man,  and  Cybernetics,  Vol.  SMC-7,  No.  11,  pp.  518-524, 
November  1977. 

1 68 

Nevatia,  R.,  "Locating  Object  Boundaries  in  Textured  Environments",  IEEE 
Trans  on  Computers,  Vol.  C-25,  No.  11,  pp.  1170-1175,  November  1976. 

169 

Agin,  G.  et  al.,  "Interactive  Aids  for  Cartography  and  Photo- Interpretation", 
ADA056355 ,  June  1978. 

17^Hwang,  J.;  Lee,  C.;  and  Hall,  E.,  "Segmentation  of  Solid  Objects  Using 

Global  and  Local  Edge  Coincidence",  in  Proceedings  of  IEEE  Computer  Society's 
Conference  on  Pattern  Recognition  and  Image  Processinc,  pp.  114-121,  August 
1979. 


51 


NAVTRAEQU1PCEN  IH-318 


templrts  which  can  be  compared  to  the  regions  found.  Another  alternative  is  to 
utilize  templets  of  generic  features  to  interrogate  the  image.  Each  region 
shape  will  have  a  particular  signature  when  operated  on  by  all  of  the  feature 
extractors.  If  the  reference  object  has  been  characterized  by  a  particular 
feature  signature,  then  regions  could  be  classified  by  comparing  i  ,eir  signatures 
with  the  reference  signatures.  By  application  of  pattern  recognition  techniques, 
many  two-dimensional  region  shapes  can  be  classified.  However,  unless  all 
potential  object  images  are  known  prior  to  processing,  there  will  be  an 
"unknown"  class,  even  if  their  is  no  "noise"  or  mistakes  in  the  edge  extraction 
process.  Despite  the  fact  that  scene  analysis  has  not  been  perfected  for  an 
arbitrary  two-dimensional  image,  investigators  have  proceeded  to  three- 
dimensional  scene  analysis. 

Three-Dimensional  Scene  Analysis.  This  can  be  divided  into  single-view 
and  multiple-view  scene  analysis.  The  single-view  class  can  be  based  on  match¬ 
ing  a  stored  object  description  as  seen  from  various  viewing  directions 
against  the  projected  boundary  in  the  available  view.  Brooks37*  describes  a 
system  which  is  supplied  with  generic  descriptions  of  objects  in  a  high-level 
modeling  language  (objects  are  segmented  into  generalized  cones).  Views  of  the 
stored  objects  are  then  matched  to  the  information  obtained  from  scene  processing. 
McKee37?  describes  algorithms  which  operate  on  an  edge  image  to  define  surfaces 
bounded  by  edges.  In  subsequent  views,  each  edge  is  compared  to  previously 
stored  edges  or  assigned  as  a  new  edge.  This  system  can  then  recognize  or 
learn.  Hemami373  works  from  objects  whose  silhouettes  can  be  found  and  compared 
to  the  shape  of  regions.  Guzman37*3  first  finds  and  classifies  vertices  (edge 
intersections)  in  an  image  of  a  group  of  polyhedrons.  By  applying  his  algorithm, 
the  machine  can  perform  a  three-dimensional  scene  segmentation.  Some  investigators 
have  utilized  distance  information  acquired  by  some  other  means  such  as  a  laser 
rangefinder  (Duda375,  Nevatia376)  to  allow  the  task  of  object  recognition  or 
three-dimensional  scene  analysis  to  be  simplified. 
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The  three-dimensional  scene  analysis  class  which  utilizes  more  than  one 
view  of  the  scene  includes  stereo  photogrammetry.  This  class  also  includes 
the  description  of  scenes  or  recognition  of  objects  all  of  whose  surfaces 
are  not  visible  in  any  of  the  multiple  views.  Zuckerl77  divides  the  scene 
space  into  volume  elements  he. calls  voxels.  His  algorithm  then  determines  the 
orientation  of  a  plane  that  separates  volumes  of  different  voxels  when  differ¬ 
ent  views  are  used.  Della  Vignal78  and  Hendersonl79  describe  algorithms 
which  describe  three-dimensional  scenes  composed  of  planar  objects.  Della 
Vigna  states  that  the  most  crucial  problem  is  identifying  the  same  vertex  in 
two  different  views.  NevatialSO  and  Lacinal81  describe  algorithms  which  track 
the  same  object  point  in  multiple  images.  Nevatia's  technique  utilizes  many 
different  views  to  track  the  same  point  with  measurement  only  needing  those 
views  at  the  extremes  of  visibility.  PotmesiU82  illuminates  the  object  with 
a  projected  grid.  His  algorithm  then  finds  the  same  grid  intersections  in 
the  different  views.  This  works  well  as  long  as  all  grid  intersections  are 
visible  in  at  least  two  views.  Shapira*8^  describes  an  algorithm  which  con¬ 
structs  a  description  of  a  three-dimensional  scene  from  multiple  views  by 
assuming  that  objects  are  composed  of  planar  or  quadric  surfaces  and  all 
vertices  are  formed  by  exactly  three  faces. 
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Summary.  The  application  of  digital  image  processing  techniques  to  environ¬ 
ment  model ing  must  be  restricted  to  automated  rather  than  automatic  operation 
at  the  present  time.  Automatic  technique  development  has  been  and  will  continue 
to  be  driven  by  applications  where  a  human  operator  cannot  be  utilized  and  so 
the  technology  will  eventually  be  available  to  the  CI6  modeler  with  or  without 
his  support.  Currently  digital  image  analysis  and  processing  techniques  can 
greatly  aid  the  modeler  with  relatively  simple  algorithms  and  operations  which 
do  not  require  a  high  degree  of  machine  sophistication.  For  example,  weighted 
averaging  can  allow  the  operator  to  observe  an  object  at  various  resolution 
levels;  digitizers  can  determine  appropriate  colors  of  scene  elements;  three- 
dimensional  geometry  of  objects  or  vertices  can  be  determined  from  multiple 
views  with  the  operator  indicating  the  same  point  in  each  view. 

FEASIBILITY 

The  digital  image  processing  facility  at  Kennedy  Space  Center  was  visited 
and  a  small  experiment  was  conducted.  A  color  aerial  photograph  of  an  urban 
area  was  digitized  into  three-digital  images  using  a  television  digitizer 
with  three  color  filters.  The  three  images  were  then  processed  together  and 
separately  to  determine  whether  the  machine  could  readily  extract  regions  which 
were  very  apparent  to  the  human  observer.  The  features  chosen  were  lakes  and 
major  roads.  The  techniques  used  were  histogram  analysis  for  the  lakes  and 
both  histogram  analysis  and  gradient  operations  for  the  roads.  In  neither 
case  did  the  processing  reduce  the  time  it  would  have  taken  for  an  operator 
to  manually  digitize  those  particular  features  using  the  same  image  on  a 
digitizing  table.  Admittedly,  the  image  was  complex  and  the  processing  algorithms 
utilized  were  relatively  simple.  But  the  fact  remains  that  an  experienced 
operator  of  the  equipment  required  at  least  an  hour  to  create  a  major  road 
map  on  the  display.  The  same  map  could  be  created  from  the  original  image 
and  a  digitizing  tablet  in  less  than  10  minutes.  The  relative  ease  with 
which  colors  of  regions  could  be  determined  does  allow  digital  image  process¬ 
ing  to  be  a  viable  assistance  to  the  modeler. 

SOFTWARE  TRANSFORMATION 

In  many  cases  the  information  required  to  construct  an  environment  model 
already  exists  in  some  machine  recognizable  form.  The  prime  example  of  this 
is  digital  terrain  models  which  have  been  assembled  for  cartographic  purposes. 

Such  digital  terrain  models  are  usually  derived  from  automated  stereo  photo- 
grammetric  instruments  in  the  form  of  fixed  grid  point  sets.  Although  the  fixed 
grid  format  is  used  in  some  digital  terrain  models  most  transform  the  fixed  grid 
into  an  irregular  grid.  Fowler^  describes  a  procedure  for  transforming  a 
fixed  grid  to  an  irregular  triangular  network.  The  difficulty  encountered  with 
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this  procedure  is  insuring  the  capture  of  important  topographical  f natures  such 
as  ridge  1 ipes  and  stream  beds.  In  practice  these  must  he  added  manually. 
Jancaitis18®  converts  a  fixed  grid  digital  terrain  model  into  polynomial 
patches  using  a  least  squares  criteria.  Patch  size  is  determined  by  how  well 
the  polynomial  fits  the  points  within  the  patch.  Other  transformation  techniques 
have  been  or  are  being  developed  for  the  creation  of  radar  and  visual  data 
bases  from  fixed  grid  digital  terrain  models. 

Since  object  contours  or  cross  sections  can  be  generated  in  many  ways,  a 
software  transformation  from  contour  information  is  also  desirable.  Fuchs*8® 
has  developed  a  technique  which  will  tile  the  surface  of  a  three-dimensional 
object  whose  cross  sections  are  simple  closed  curves  with  triangular  polygons. 

The  tiled  surface  generated  is  valid  at  all  of  the  given  contours  to  a  given 
precision. 

Although  this  report  is  primarily  concerned  with  the  modeling  of  a  real 
world  environment,  there  is  some  advantage  to  having  a  system  with  which  the 
operator  can  sculpt  objects  rather  than  copy  existing  objects.  Parent*8? 
describes  the  sculptor's  studio  environment  of  the  Computer  Graphics  Research 
Group  at  Ohio  State  University. 

Besides  transformation  programs  another  important  software  function  is  the 
bookkeeping  required  to  insure  that  the  modeled  environment  conforms  to  the 
requirements  of  the  real  time  CIG  system.  For  example,  Monroe188  describes 
the  limitations  imposed  on  a  polygon  data  base  in  terms  of  the  maximum  number 
of  potentially  visible  edges  from  any  one  viewpoint  as  well  as  the  maximum 
number  of  edges  per  object.  The  number  of  closed  convex  polyhedral  objects 
which  can  be  grouped  to  form  a  model  is  limited.  The  number  of  objects  and 
the  number  of  models  within  the  field  of  view  and  range  of  view  is  also  re¬ 
stricted.  With  proper  software  parameters,  edge,  object,  and  model  counts 
car  be  monitored  by  the  machine  so  that  the  real  time  CIG  capacity  is  not 
exceeded  by  the  environment  complexity. 

The  development  of  transformation  and  bookkeeping  software  is  highly 
system  specific  and  not  in  the  scope  of  this  report. 
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SECTION  V 

SYSTEM  RECOMMENDATIONS 


INTRODUCTION 

This  section  discusses  the  recommended  design  of  an  environmental  data 
base  generation  facility  which  utilizes  the  tools  and  techniques  discussed  in 
the  previous  sections.  The  basic  information  to  be  recorded  in  the  data  base 
is  assumed  to  be  geometric  and  appearance  parameters.  Current  systems  which 
are  utilized  to  generate  environments  will  be  described,  followed  by  a  discus¬ 
sion  of  the  application  of  stereo  photogrammetric  and  digital  image  analysis 
techniques  to  automating  the  tedious  operator  tasks.  Finally,  a  recommended 
approach  to  improving  the  efficiency  of  the  environment  data  base  generation 
facility  at  the  NAVTRAEQUIPCEN  will  be  proposed. 

CURRENT  SYSTEMS 

Morland^  describes  the  environment  data  base  generation  system  developed 
by  General  Electric  for  the  NAVTRAEQUIPCEN.  It  was  designed  to  provide  the 
capability  of  generating  environments  specifically  for  the  real  time  CIG  system 
at  NAVTRAEQUIPCEN.  It  consists  of  three  major  subsystems;  a  digitizer  station, 
a  non-real -time  CIG  emulator,  and  a  camera  station.  The  digitizer  station  con¬ 
sists  of  a  digitizing  table,  which  provides  the  two-dimensional  coordinates  of 
a  point  on  the  table  indicated  by  an  operator  with  an  electronic  pen,  and  an 
interactive  graphics  display  system  which  is  capable  of  displaying  perspective 
views  of  wire  frame  models  whose  three-dimensional  vertex  locations  and  connec¬ 
tivity  have  been  supplied  by  the  operator  either  through  the  digitizing  of 
several  two-dimensional  views  of  a  scene  (usually  orthographic  views  in  the 
form  of  blueprints)  or  by  operator  insertion  of  coordinates  through  a  keyboard 
or  by  modifying  a  previously  digitized  vertex  by  use  of  the  digitizing  tablet, 
an  electronic  pen,  and  a  cursor  on  the  display.  Once  the  modeler  has  generated 
and  viewed  the  displayed  wire  frame  model  and  is  satisfied  with  its  appearance, 
edge  count,  and  object  count,  he  is  ready  to  record  an  object  or  model  for  the 
CIG  data  base.  The  type  of  information  needed  for  an  object  description  is;  a 
unique  object  name,  a  designator  as  to  whether  it's  two-dimensional  or  three 
dimensional,  the  number  of  vertices,  the  number  of  polygon  faces,  and  the  face 
data  (coordinates  of  vertices  and  color).  Model  descriptions  include  the  type 
of  model,  the  names  of  the  objects  composing  it,  the  way  in  which  the  objects 
join,  the  level  of  detail  as  a  function  of  range,  the  size  of  the  model,  its 
orientation,  and  priority  information.  An  environment  description  includes  the 
name,  the  names  of  the  models  within  the  environment,  the  location  and  orienta¬ 
tion  of  the  models  within  the  environment,  special  codes  for  sun  angle  illumin¬ 
ation,  and  surface  color  blending.  The  real-time  CIG  emulator  is  a  minicomputer 
programmed  to  duplicate  the  perspective  transformations  and  hidden  surface  algo¬ 
rithms  implemented  in  the  real-time  CIG  hardware  at  NAVTRAEQUIPCEN.  The  rendered 
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imagery  is  then  fed  to  a  film  recording  station  which  utilizes  a  monochrome, 
high-resolution  CRT  imaged  onto  recording  film.  Three  exposures  through 
color  filters  serve  to  produce  a  color  image. for  evaluation. 

Schnitzer 190  describes  the  data  base  development  facility  utilized  by 
Singer-Link  to  provide  environments  for  the  Singer  real-time  CIG  system.  He 
describes  the  construction  of  a  CIG  environment  using  the  DMAAC  DLMS  data 
base,  culture  files  (DMA191).  The  culture  files  consist  of  a  plan  view  of 
the  earth's  surface  in  which  polygons  define  cultural  feature  boundaries  In 
a  high-level  language  description.  The  process  utilized  by  Schnitzer  semi- 
automa  ;ically  generates  three-dimensional  environments  from  these  two-dimen¬ 
sional  polygons.  The  example  cited  utilized  a  culture  file  of  much  higher 
density  than  the  standard  DLMS  product;  4300  vertices  in  one  square  mile. 

The  concept  to  simply  "extrude"  each  polygon  into  a  three-dimensional  object 
is  powerful  and  can  be  done  fairly  automatically  for  objects  with  vertical 
sides.  However,  there  are  many  cases  where  the  feature  information  must  be 
supplemented  by  the  modeler  using  other  sources. 

B  ack19^  describes  the  environment  generation  techniques  employed  by 
Evans  and  Sutherland  for  their  CIG  systems.  A  digitizing  tablet  is  the  prime 
source  of  vertex  coordinate  determination  from  which  wire  frame  models  are 
made.  The  wire  frame  model  is  interactively  modified  by  the  modeler.  Soft¬ 
ware  routines  are  used  to  transform  line  drawings  to  polygons  and  to  define 
solid  objects. 

A  software  routine  allows  the  modeler  to  create  a  polygon  tiled  surface 
of  revolution  by  just  specifying  a  two-dimensional  curve  and  the  number  of 
polygons  desired. 

The  development  of  three-dimensional  graphics  environments  primarily 
for  non -real -time  graphics  systems  has  been  accomplished  at  several  univer¬ 
sities.  Clarkl93  discusses  the  sculpting  methods  employed  at  The  University 
of  Utah.  The  novel  system  described  employs  a  head  mounted  display  and  a 
three-dimensional  wand.  The  sculptor  simply  moves  the  wand  about  to  create 
the  environment  which  he  then  observes  in  three  dimensions  as  his  inputs  are 
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rendered.  Clark  concludes  that  three-dimensional  interaction  is  far  superior 
to  standard  design  techniques  for  three-dimensional  environments.  Greenbergl94 
discusses  the  facilities  at  Cornell  University.  He  describes  four  methods  of 
interactively  building  an  environment;  assembling  conglomerations  of  primitive 
volumes,  utilizing  multiple  two-dimensional  views,  utilizing  serial  cross-sections, 
and  extruding  two-dimensional  shapes.  He  recomnends  that  all  of  these  tools  be 
available  to  the  modeler.  Hackathornl95  describes  the  facilities  of  the  Computer 
Graphics  Research  Group  at  the  Ohio  State  University.  Based  on  the  descriptions 
of  existing  systems  and  available  data  acquisition  technologies,  the  characteris¬ 
tics  of  an  optimum  environment  data  base  generation  facility  can  be  defined  to¬ 
gether  with  implementation  recommendations. 

INTERACTIVE  SYSTEMS 

Schneidermanl96  describes  the  general  policies  to  be  considered  in  designing 
interactive  systems.  These  are  summarized  in  fairly  general  statements  based 
on  human  factors  experiments.  Interactive  systems  should  be  simple  to  operate 
but  perform  powerful  operations.  The  operational  procedures  should  be  easy 
to  learn  and  yet  appeal  to  experienced  users.  Errors  should  be  handled  easily 
but  freedom  of  expression  should  not  be  restricted.  The  system  development 
time  should  be  as  short  as  possible  with  low  cost  and  capability  for  future 
modifications.  Although  these  statements  are  general,  Schneiderman  does  give 
some  specific  recommendations.  The  maximum  response  time  should  not  exceed 
one  or  two  seconds  for  simple  user  commands.  In  no  cases  should  the  response 
time  exceed  15  seconds.  In  the  case  of  environmental  modeling,  a  simple  command 
might  be  a  vertex  entry.  A  complex  command  might  involve  changing  the  viewpoint 
and  look  direction  for  a  rendered  environment. 

Recommendation.  The  interactive  facility  should  be  capable  of  rendering  a 
display  of  a  modified  environment  in  less  than  15  seconds.  This  requirement 
precludes  the  utilization  of  a  film  writer  as  the  display  means.  The  preferred 
method  of  display  is  a  CRT  monitor  having  visual  characteristics  emulating  the 
display  utilized  in  the  real-time  system  and  sufficient  computation  power  to 
render  an  image  in  less  than  15  seconds. 


194 

Greenberg,  D. ,  "An  Interdisciplinary  Laboratory  for  Graphics  Research  and 
Applications",  Computer  Graphics,  Vol .  11,  No.  2,  pp.  90-97,  Summer  1977. 

1  95 

Hackathron,  R.,  "ANIMA  II:  A  Three-Dimensional  Color  Animation  System", 
Computer  Graphics,  Vol.  II,  No.  2,  pp.  54-64,  Summer  1977. 

196Shneiderman,  B.,  "Human  Factors  Experiments  in  Designing  Interactive  Systems", 
Computer,  pp.  9-19,  December  1979. 


NAVTRAEQUIPCEN  IH-318 


DISPLAY  SYSTEMS 

Latta^^  analyzes  display  design  both  from  the  standpoint  of  display 
operator  and  from  the  limitations  imposed  by  the  state-of-the-art  display 
hardware.  He  concludes  that  a  14-inch  square  display  having  1,024  x  1,024 
pixels  viewed  from  a  distance  of  4  feet  meets  the  acuity  and  comfortable 
viewing  distance  requirements  of  the  observer.  The  implementations  of  this 
recommendation  requires  some  tradeoffs  based  on  available  displays.  A  19" 
high-resolution  color  shadow  mask  CRT  having  980  raster  lines  is  probably 
the  best  alternative.  A  more  standard  monitor  (normal  525  line  TV)  would 
be  less  expensive  but  require  a  viewing  distance  beyond  the  comfortable 
range  if  acuity  is  to  be  maintained.  A  2,048  x  2,048  display  requires  an 
uncomfortably  close  viewing  distance  if  the  display  acuity  is  to  be  utilized. 
Carlson*™  lists  30  requirements  for  graphics  terminals  and  evaluates  available 
graphics  terminals  as  inadequate  for  all  but  12  of  his  requirements.  Hubble*^ 
provides  a  survey  and  feature  comparison  of  ten  commercially  available  real¬ 
time  color  digital  image  displays.  The  utilization  of  more  than  one  display 
monitor  in  a  system  allows  stereo  viewing  or  multiple  operators.  For  stereo 
viewing  on  a  single  monitor  RoeseZOO  describes  a  field  sequential  stereo  display 
which  is  utilized  with  PLZT  ceramic  stereo  glasses.  The  glasses  are  commer¬ 
cially  available.  Ohlson2^!  surveys  various  devices  for  allowing  an  operator 
to  interface  with  the  interactive  system.  These  Include  digitizing  tablets, 
touch  panels,  joysticks,  and  trackballs.  The  use  of  horizontal  tablets  or 
other  devices  which  allow  the  operator  to  rest  his  arm  are  preferable  to  the 
use  of  light  pens  which  must  be  positioned  on  the  vertical  surface  of  the  CRT 
from  the  standpoint  of  reducing  operator  fatigue. 

Recommendation.  Operator  station  should  consist  of  at  least  two  display 
monitors;  a  color  monitor  having  the  capability  of  rendering  imagery  equivalent 
in  color  and  acuity  to  the  display  driven  by  the  real-time  CIG,  and  a  monochrome 
computer  terminal  type  display  for  alphanumeric  interaction  with  the  system. 
Operator  controls  should  include  a  digitizing  tablet  or  joystick  as  well  as 
a  standard  terminal  keyboard.  The  capability  to  view  renderings  in  stereo  can 
be  achieved  through  the  field  sequential  stereo  method  noted  above. 
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STEREOPHOTOGRAMMETRIC  EQUIPMENT 

Although  digital  terrain  models  are  available  from  sources  such  as  the 
Defense  Mapping  Agency  and  the  U.S.  Geological  Survey,  as  well  as  high-level 
language  descriptions  of  type  and  class  of  cultural  features  as  a  function  of 
geographic  location,  there  are  still  a  large  number  of  shapes  in  the  real  world 
whose  geometric  descriptions  do  not  exist  in  a  digitized  form.  For  this 
reason,  stereophotogrammetric  techniques,  specifically  that  class  of  photogram- 
metry  known  as  close  range  or  terrestrial  photogrammetry ,  offer  a  means  for 
assisting  the  modeler.  The  function  of  the  stereophotogrammetric  equipment 
would  be  to  produce  a  geometric  description  of  a  specific  cultural  object  from 
stereo  photographs.  In  many  cases  such  a  specific  model  could  be  used  as  a 
generic  model  which  can  reside  in  a  feature  library  to  be  called  up,  modified 
appropriately,  and  inserted  into  the  environment.  The  type  of  equipment  required 
includes  a  camera  for  initial  image  acquisition  and  an  analytical  stereo  plotter 
interfaced  to  the  system  computer.  In  operation,  the  camera  system  would  be  used 
to  take  stereo  photographs  of  the  desired  object  from  a  sufficient  number  of 
viewpoints  to  insure  complete  stereo  coverage.  A  number  of  object  control  points 
would  be  recorded  at  the  same  time.  The  analytic  stereo  plotter  would  then  be 
used  to  create  the  stereo  model.  The  operator  would  input  the  various  control 
parameters  to  properly  orient  the  stereo  model.  The  three-dimensional  coordinates 
of  any  operator  selected  point  on  the  three-dimensional  image  of  the  object  could 
then  be  automatically  recorded.  By  using  the  keyboard,  the  operator  can  group 
vertices  as  belonging  to  specific  polygons  and  polygons  as  belonging  to  specific 
objects  etc. 

Although  stereophotogranmetric  analysis  systems  based  on  the  utilization 
of  two  television  images  have  been  implemented  (Yakimovsky202,  Liebes203),  the 
greater  resolution  of  film  based  systems  and  the  lack  of  need  for  rapid  raw 
data  acquisition  precludes  their  use  for  this  application. 

The  speed  with  which  an  operator  can  digitize  stereo  models  is  a  strong 
function  of  the  relief  and  relief  variation  in  the  model.  Speakman204  describes 
a  task  in  which  21,035  points  were  encoded  in  56  operator  hours.  If  the  stereo 
model  is  amenable  to  automatic  stereo  correlation,  it  would  be  possible  to  have 
it  digitized  as  a  service.  Production  costs  for  digital  terrain  models  and 
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orthopl  otos  (Hagan205f  Gockowski206,  and  Foster2^)  should  not  exceed  $25  -  $50  per 
square  nile  when  elevations  are  required  on  a  fixed  30  meter  grid. 

Rt  :ommendation.  Stereo  analysis  equipment  consisting  of  a  metric  camera, 
survey  ig  instruments  (to  obtain  ground  control)  an  analytical  stereo  plotter  and 
softwaie  to  interface  to  modeling  system  is  recommended.  The  products  of  such 
a  syst«m  would  be  specific  object  geometric  models  which  can  be  used  as  generic 
models  to  form  an  environment. 

DIGITAL  IMAGE  PROCESSING  EQUIPMENT 

The  application  of  digital  image  processing  equipment  to  interactive 
environmental  data  base  development  is  relatively  limited  in  that  the  tasks  which 
are  done  best  b>  current  equipment  have,  for  the  most  part,  been  done.  The 
segmenting  of  a  geographic  area  into  areas  labeled  with  predominant  feature 
classifications  is  Included  in  the  digital  land  mass  system  (DLMS)  (DMA208). 

The  more  sophisticated  artificial  intelligence  type  of  applications  are  extremely 
limited  in  scope  and  are  best  applied  to  specific  pattern  or  shape  recognition 
tasks.  However,  there  are  some  relatively  simple  application  of  digital  image 
processing  which  can  be  utilized  by  a  data  base  modeler.  These  include;  the 
use  of  a  digitizer  to  measure  color  of  an  object  surface  in  a  color  photograph, 
the  use  of  enhancement  techniques  to  make  imagery  more  comfortable  to  view  or 
to  emphasize  classes  of  features,  measurement  of  gray  level  variation  or  texture 
within  an  image  region  corresponding  to  an  object  surface. 

Andrews209  describes  the  digital  image  processing  facility  at  the  University 
of  Southern  California  Image  Processing  Institute.  Schrock210  and  Gambino211 
describe  the  facility  at  the  U.S.  Army  Engineering  Topographic  Laboratory. 
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Faust212  describes  the  pattern  recognition  facility  at  Rome  Air  Development 
Center.  Wilson213  describes  the  system  at  the  Marshall  Space  Flight  Center 
used  for  image  enhancement  and  image  restoration.  Cunningham21^  describes 
the  five  major  components  of  a  system  as;  digitizer  (microdensitometer  or 
television),  digital  computer,  computer  to  hard  copy  output,  color  CRT  moni¬ 
tor,  and  custom  interfaces.  Rohrbacher218  describes  high-speed  image  pro¬ 
cessing  with  the  STARAN  parallel  computer.  Fanshier218  describes  the  impact 
of  currently  available  hardware  on  digital  image  processing  systems.  Wittig21? 
describes  techniques  utilized  for  the  production  of  100  land  use  maps.  He 
concludes  that  automatic  segmentation  is  superior  to. manual  digitizing  only 
if  the  original  photography  is  very  clean  to  start  with.  He  found  that  the 
major  problems  associated  with  manual  digitizing  were;  inaccuracies,  missing 
information,  and  digitization  of  the  same  point  twice.  Booth218  describes 
all  the  system  components  in  a  digital  image  analysis  system  and  their  effects 
on  end  results.  He  states  that  an  understanding  of  the  long  chain  of  trans¬ 
ducers,  signal  conditioners,  and  processors  which  produced  the  image  to  analyze 
is  essential  to  the  analysis  task.  Reynolds21^  describes  a  technique  for 
applying  generic  texture  tiles  to  simulate  real  world  texture. 
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Recommendation.  Digital  image  processing  equipment  should  be  restricted 
to  a  television  digitizer  (3-color),  a  frame  memory  (3-color),  and  minimal 
hardware  processing  capability  at  the  current  time.  As  processing  capability 
is  driven  to  more  generally  applicable  systems  by  other  than  CIG  modeling 
requirements,  continue  to  reevaluate  system  performance  for  future  system 
improvements. 
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SECTION  VI 

SUMMARY  AND  CONCLUSIONS 


SCENE  DETAIL  REQUIREMENTS 

This  report  reviewed  an  extensive  body  of  literature  describing  visual 
capabilities  and  visual  task  performance  in  an  effort  to  quantify  the  fidelity 
of  a  visual  simulation  to  the  real  world.  The  required  fidelity  is  highly 
task  dependent  and  there  is  no  general  rule  which  will  apply  to  all  training 
situations.  A  recommendation  is  made  to  construct  models  which  are  faithful 
to  the  real  world  only  to  the  degree  necessary  to  identify  the  objects  which 
are  relevant  to  the  visual  tasks  to  be  trained.  In  practice  this  could  be 
accomplished  using  photographs  in  which  images  of  objects  are  resolved  only 
to  an  identification  level  as  guides  to  the  modeler. 

DATA  ACQUISITION  AND  REDUCTION 

The  generation  of  environment  models  from  imagery  is  a  difficult  task 
even  when  automated  techniques  are  used.  The  conversion  of  existing  models 
to  the  desired  form  through  the  use  of  transformation  software  is  the 
preferred  approach  if  an  existing  model  is  available.  The  Computer  Systems 
Laboratory  at  NAVTRAEQUI PCEN  has  been  and  will  continue  to  be  developing 
this  approach  to  automatic  data  base  generation. 

In  the  case  of  real  world  environments  which  have  not  been  reduced  to 
models  the  techniques  of  stereophotogrammetry  and  digital  image  processing 
offer  potential  improvements  to  the  data  base  generation  process.  The 
quantification  of  the  improvement  in  efficiency  can  be  determined  by  com¬ 
paring  the  actual  costs  of  manual  modeling  to  a  modeling  procedure  incor¬ 
porating  these  techniques  when  modeling  the  same  environment  from  the  real 
world  to  the  same  degree  of  fidelity. 

The  use  of  artificial  intelligence  techniques  for  converting  image 
information  into  a  semantic  model  which  then  may  be  converted  into  a  CIG 
environment  model  is  a  potential  future  solution  to  automating  the  data 
base  generation  prc  ;ess.  However,  the  current  state-of-the-art  is  not 
sufficiently  developed  to  apply  these  techniques  to  complex  visual 
environments. 
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